Top 10 Datasets for Health Hackers

We’ve seen glimpses of what big data can do to help provide solutions in healthcare–and we’re excited. Access to patient and cost data is key to actionable and relevant insight, but the healthcare system today doesn’t make this information readily available. Hopefully, we will soon be in a world with established data standards that allow for the frictionless transfer of valuable data across systems and interested stakeholders. But until then, here are our favorite datasets for health hackers.

Real-time data:

  • Insight by Practice Fusion is a real-time healthcare database based upon records of over 250K patients per day. You’ll be able to see information like disease trends over time and by patient, what diseases are being diagnosed, and real-time prescription drug market share.

  • Validic is a technology platform for accessing data from mobile health devices, in-home devices, and patient healthcare apps. Similarly, Human API is providing the data infrastructure to allow for simple integration of health data.

  • For developers who want to be connected to the healthcare system and leverage publicly available datasets, check out AT&T’s mHealth initiative.

Historical data:

  • Medicare, Medicare, and more Medicare! Ranging from nursing home comparisons to Medigap data, has an incredible wealth of databases available for you to download. CMS is also a great resource, whether you want to run analytics on provider utilization and payment data, claims data, or NPI files.

  • Department of Health and Human Services has collected over 1K public health datasets. Thankfully, their filter and search system isn’t half bad! Bonus: You can filter by topic, source, state, and more to discover details like a patient’s hospital experience by provider or state averages on quality measure, staffing, fine amount, and number of deficiencies. Also, they have a convenient API for you to access data available on

  • Interested in examining cancer stage at diagnosis by race/ethnicity? Survival rate by stage? Or trends and incidence rates of cancers at various sites over time? Check out the National Cancer Institute’s SEER data.

  • The FDA launched openFDA, which will allow developers to access public FDA data through open APIs, raw data downloads, and documentation and examples. The first dataset available includes reports from 2004 through 2013 on drug adverse events, such as adverse reactions or medication errors submitted. And don’t forget the FDA drug database, which is a great resource as well!

  • And coming in early 2015… Health Care Cost Institute (HCCI) will have an information portal with data on health care costs. Currently, HCCI is working with large insurers, such as Aetna, Humana, and UnitedHealthcare, to develop and provide healthcare price transparency tools for consumers.

What other datasets do you know of? Comment below!