Bad Data: A Chronic Condition of Healthcare?

File Room
By: Andy Oram

A lot of health-related data has released recently — some datasets of note include what health providers charge for services and Medicare prescribing data (1:34 into the keynote video). Application developers as well as health care reformers, payers, and patient advocates benefit from such open data. Untold terabytes more is pawed over secretly by insurers, large providers, and marketing firms. So have we achieved data nirvana? Not quite.

Dive in with me, as I survey the field of health care data.

Can we improve health care even using bad data? Certainly. After all, we have determined the age of the universe pretty closely with only a few vibrations from unimaginably large distances for evidence. Few choices are as idiosyncratic as how people vote, but Nate Silver combined rough data from many polls and accurately called the 2012 presidential election in every state. Modern statistical tools can do wonders for health care too, even with imperfect (to say the least) data.

What did the doctor say?

Let’s start with the supposed gold standard, the medical record. At Strata Rx, Beth Israel CIO John Halamka stated that 3% of health records list the patient’s gender incorrectly (13:41 into the keynote video). If they can’t always get that right, what about details such as the onset of disease and the course of treatment? Sit with your provider while she records your visit and it’s easy see why errors creep in. The data is just too complicated (every patient is unique) and there’s too much to record. Meanwhile, the waiting room is overflowing and it’s long past lunch.

Electronic records are both an aid and a hindrance: they provide structured fields to help standardize input, but there are so many fields that they’re hard to use consistently. The same goes for medical coding standards. The new ICD-10 standard for classifying diseases (new to US providers, I should say–it has actually been around since at least 1994 and will be soon made obsolete by ICD-11) has so many codes that there are numerous ways to improve reimbursement by choosing the highest-risk classification.

It also doesn’t help things that patients fuzz what they tell the doctor, out of embarrassment or simple confusion. After all, I did intend to start that statin right away and take it faithfully every day. And that swelling in my groin is so minor I won’t even notice it till the next visit to the clinic.

We also hold back from doing gathering all the data we could use because tests are intrusive, dangerous, or costly. As one doctor told me at the conference, “There’s always another test you could take.”

Your pulse is in the 70s, sort of

Health care developer Shahid Shah points out we can get more reliable data, and more of it, by attaching devices to people in everyday living. But their output is not perfect. Rachel Kalmar, who tests a dozen devices at a time from her company Misfit and its competitors, reports significant differences in what they report at her recent Strata Rx presentation.

Everybody who follows the trends in self-tracking say it’s on the increase, whether it’s instigated by an individual on her own or by her doctor. So devices can be the next big source of big data, but can we combine and use all those sensor results?

Not too easily, according to a talk by Rachel Kalmar. Basically, devices don’t talk to one another. So you can upload the data from each device to its company’s web site, but you can’t combine them on the fly, or write a program to make one device respond to input from another. In Kalmar’s vision of the future, a device could notice when your mood is low and tell your apartment to turn up the lights. Some devices are good at one particular measurement but rather sloppy at reporting others. We have to give the manufacturers a break here. They’re juggling several incompatible goals, such as keeping the device cheap and simple to use, trying to make it unbreakable, and reducing battery use (which in turn determines how often the device can collect data and transmit it over a wireless network).

But if a lot of us collected personal data in the field regularly, we’d still be a quantum leap above what we get from routine office visits. After all, occasional blood pressure measurements have served for decades as good indicators of our life expectancy, even though the results depend on extremely subjective decisions based on the hearing of the clinician taking the measurement. Once again, imperfect data can still be good data.

Although some technical barriers stand in the way of device intercommunication, as Kalmar adroitly explained, the real problem is business models. Device manufacturers can’t possibly sell devices if they pass the full costs of design and manufacturing on to the consumers, so–like health app developers–they collect user data and repurpose it in many ways, some of which might prove troubling to users.

Kalmar regaled us with a long list of ways manufacturers might be more open about their data while deriving revenue. These include exposing an API but charging for it in various ways, finding trusted partners who can get free access, and charging for data downloads. I suggested the dual licensing model that has worked for MySQL and some other projects. But Kalmar pointed out that this model depends on confidence that a certain business model will make money, and we don’t yet know how devices can make money.