Developing digital biomarkers: An interview with Evidation’s Co-founder and Chief Data Scientist, Luca Foschini, Ph.D.

The original question posed in our 2016 digital biomarkers report asked whether digital biomarkers—consumer-generated physiological and behavioral measures collected through connected digital tools—would provide additional value beyond traditional biomarkers in helping to understand health and disease. Today, it seems clear the answer is a resounding “yes”—but just how much progress have we made, and how exactly do we construct these digital biomarkers from the data science perspective?

Rock Health portfolio company Evidation Health is a new kind of health and measurement company that provides technology and guidance to understand how everyday behavior and health interact and measure how behaviors outside the doctor’s office or hospital impact health outcomes. Evidation’s Chief Data Scientist, Luca Foschini, sat down with us to provide insight into the current state of digital biomarkers and how their team leverages both deep data science expertise and healthcare knowledge. (PS—they’re hiring!)

How would you characterize the progress we’ve seen in the digital biomarker space?

We’ve seen significant growth in the research focused on digital biomarkers, but the vast majority of digital biomarkers have yet to find practical applications. Much of the initial work in this area was started by researchers in computer science and electrical engineering rather than those in medicine. As a result, the technical aspects of the research are strong, but the experimental design can still be improved for clinical use cases.

As the value of digital biomarkers has become more evident to the healthcare industry, however, we’ve seen more collaboration between healthcare and tech in this research. It’s important to note that the use of digital biomarkers as part of drug development or healthcare delivery requires careful research and validation that can take several years. While no digital biomarkers have been approved as drug development tools yet, biopharma companies are conducting studies to develop novel endpoints based on digital biomarkers.

What are some therapeutic areas where we’ve seen notable development of digital biomarkers in the past couple of years?

There’s been a broad range of activity in different therapeutic areas, including but not limited to cardiovascular disease, sleep, and neuropsychiatric diseases. In each of these therapeutic areas, there is signal from the data collected by devices many people are already wearing, and the connections between these signals and the condition are more obvious. For example, it’s more intuitive that cardiovascular disease would be related to a person’s activity or heart rate data and someone’s social network activity might be linked to their mental health. This data includes accelerometer, heart rate, step, sleep, social network activity, voice, and even blood glucose data. This isn’t to say we haven’t seen progress in other areas as well—it’s exciting to see new research every day demonstrating the depth of health information captured by consumer wearables, including in areas like oncology.

Within cardiovascular disease, several companies have focused on detecting atrial fibrillation. While the FDA has not yet approved any algorithms based solely on smartwatch data, a digital biomarker for afib detection is available in conjunction with an approved EKG. AliveCor’s SmartRhythm app uses machine learning to detect abnormal heart rate and activity data from the Apple Watch, and then prompts the user to take an EKG via AliveCor’s Kardiaband. The gold standard for afib diagnosis is based on EKG data.

Meanwhile, the path to commercial usage of digital biomarkers looks very different in sleep, since it spans the wellness space as well. Since much less validation is required, several digital biomarkers such as Fitbit’s sleep stages are available commercially, though we rarely refer to them as digital biomarkers in day-to-day conversation.

Finally, in neuropsychiatric diseases, where traditional measures have been largely subjective, there is a lot of research currently in progress. For example, we’ve done proof of concept work with DARPA to examine voice data as a potential biomarker for Alzheimer’s disease.

From a data science perspective, what have been the greatest challenges in constructing digital biomarkers?

The two biggest challenges we’ve faced have been (1) the lack of enough signal to construct digital biomarkers in some situations and (2) the individual variability of the conditions we’re working on.

The lack of enough signal occurs when the inferences we’re making are bound by the signal in the data, rather than the inference methods—specific data science techniques—used. That may happen because we’re basing our inferences on data collected from consumer-grade wearable devices in the wild (through everyday behavior), where there are highly variable conditions. There’s a tradeoff between invasiveness in real life and the power of the signal collected. Since we’re looking to gather continuous data, we need something that constitutes a low burden to users. As consumer wearables and sensors become more and more sophisticated in the future, however, we’ll have more signal to work with for the same level of convenience.

We’ve also found that many of the conditions we’re trying to better understand via digital biomarkers have significant person-to-person variability. This is due in part to a sort of survivor bias in medical research. These conditions have been the hardest to understand with traditional ways to measure health because they differ so much for each individual, which is why we are now looking to digital biomarkers to better characterize them.

As an example of the individual variability found within a condition, migraine patients respond to pain in very different ways. From the perspective of patients’ step data, one person might almost immediately stop moving, while another might continue moving in almost the same ways he or she typically does. Hence, for certain conditions we need to train the model individually on each person. This requires more longitudinal data, which is challenging simply because it takes longer to collect.

Can you walk me through the data science methods used to develop a digital biomarker once data has been collected?

Developing and validating digital biomarkers requires a broad spectrum of data analysis tools drawn from diverse scientific fields, including signal processing, biostatistics, epidemiology, time series analysis, psychometric research and, of course, machine learning.

The development starts with data cleaning. This may involve signal processing de-noising techniques for high-frequency data streams, and includes outlier removal / winsorization, imputation (see our work on this) and quality assessment (i.e., per-individual data coverage). Such tasks are fairly similar across use cases and our team at Evidation has built libraries to automate much of this pipeline.

Then, the modeling starts. From a methodological perspective, in most cases, biomarker development reduces to an exercise in supervised machine learning, where the goal is to develop an index computed from patient data (e.g., passively tracked physiological signals) that reflects, detects, or forecasts a measured outcome considered to be the ground truth (e.g., risk of having a seizure in the next 30 minutes, likelihood of currently experiencing a COPD exacerbation, probability of having a type II diabetes diagnosis). Therefore, developing a digital biomarker usually entails training a model on examples of data snippets (e.g., raw time series of the days preceding the event of interest) paired with the correspondent ground truth value. The trained model is expected to generalize to predict what the ground truth would be for new examples of data snippets never seen before.

How do you evaluate which method (ranging from regressions to deep learning) is optimal for a specific digital biomarker or type of domain problem (e.g., identifying a condition in an asymptomatic individual, predicting a serious complication before it occurs)? Are certain types of biomarkers or data more suited to one type of analysis versus another?

The data science methods used vary depending on the prediction task, the features of interest, and the labels available. The specific methods adopted vary significantly based on whether the prediction task is between-subjects (e.g., classify COPD patients based on severity) or longitudinal (within-subject and geared towards creating individualized models, e.g., detect a specific COPD exacerbation event). For longitudinal predictions, hierarchical models such as mixed effect models can be used.

In cases when features of interest, variables derived from the raw data, are well understood (e.g., HRV measures based on RR intervals), simple models (e.g., logistic regression and random forests for discrete outcomes, lasso/elastic nets for continuous ones) are preferable as issues are easier to diagnose and results are explainable. Feature engineering is usually performed to some extent and is driven by exploratory data analysis (EDA). In cases when features to be computed from raw data are less well understood, deep learning models may provide an advantage, given their ability to learn representations that allow mapping raw data directly to outcomes, thereby saving feature engineering work. We’ve presented some of our work with deep learning models at KDD and NIPS.

Finally, methods considered can vary significantly depending on the quality and presence of ground truth, or labels. Usually, ground truth is captured using the currently accepted gold standard in measuring the underlying quantity of interest (e.g., PHQ-9 for depression symptoms). If ground truth is expensive or burdensome to collect (e.g., blood test), then semi-supervised learning methods may be considered. Semi-supervised learning and multi-task learning (learning to predict multiple outcomes at the same time) can also be useful when labels are noisy, i.e., display significant variability in capturing the same underlying quantity.

There’s still so much we have yet to explore in terms of using continuous data collected from sensors to help us understand health and disease. These new sources of data could lead to novel insights linking a measurement to a condition that it has not previously been associated with, like blood pressure and depression. When you select sensors and data types, how do you balance choosing specific data sources based on what we know already in medicine versus collecting more types of data to see what new insights we can discover?

Some might expect that we try to search for signal in the most clinically efficient manner possible, with an exact idea of where we’ll find the most signal. That’s actually not how we do it. I believe it’s best to leave the door open to discovering insights from data sources we don’t necessarily expect to provide a lot of signal.

At Evidation, our process is to understand the physiological underpinning of the condition and map symptoms to potential signals, such as heart rate and sleep. Next, we map these to devices and how they might be able to cover different types of data. Typically, we have several options for devices that might cover the signals we want to collect, and the different options cover other signals we didn’t originally consider during the signal mapping process. At this point, we look at the marginal gain of new information we might learn from these additional signals in order to choose the devices. Those additional signals are how we incorporate more exploratory research to explore that space where we don’t yet know what we don’t know.

What is your dream in terms of a generally available data capture device?

My dream is a device that can provide a continuous measure of what happens inside your body (e.g. a continuous measure of cortisol, proinflammatory cytokines, and in general stress, inflammatory, or other metabolic markers). This would allow even deeper connections to be drawn between everyday behavior and health.

Your background is in computer science. How important is deep healthcare knowledge for the work you do to develop digital biomarkers?

At Evidation, we view deep healthcare knowledge and data science expertise to be equally important to our work in developing digital biomarkers. Our first step is to understand the physiological basis for the condition in order to map symptoms to potential signals and sensors. That’s why we built our team with people from both healthcare and tech—we don’t expect all members of our data science team to have deep expertise in healthcare when they join Evidation, since we have other team members with complementary knowledge.

Evidation is hiring!

Join them to discover how everyday behavior and health interact and transform the way health is measured and diseases are identified, treated, and monitored.

Check out a few of their open positions, including Infrastructure Engineer, Principal Frontend Engineer, and Principal Software Engineer.

Special thanks to Evidation’s Michelle Xie for her tremendous help in facilitating this Q&A.