Skip directly to content

Open Data opportunities along the Data Value Chain

on Sun, 02/22/2015 - 08:31

It's a great weekend for Open Data enthusiasts. Activities around Open Data Day span the globe, and the discussion on social media like Twitter is insightful and stimulating. Anyone working with data has an interest in sharing data with others. Most of the creativity or knowledge that can be applied to your data sit outside your organization. Releasing data for others to use will dramatically increase your data's impact. In the case of health, sharing data will help save lives by informing research, policiies and decisions to improve prevention of diseases and delivery of healthcare.

There are many opportunities for releasing data as open data. If you are running research projects, you probably hit many or all steps along the Data Value Chain. The table below shows opportunities to release data on each of the 10 steps of a generic Data Value Chain (download the high res version by clicking on the image below). There are certainly many more opportunities. Send suggestions via Twitter or the contact form, and I'll update the table.

Knowledge Café on Communicating Data for Impact

on Wed, 02/11/2015 - 16:27

What do you learn when 110 people discuss data and communications in a knowledge café? Quite a lot. On Tuesday, IHME and Forum One hosted an event on Communicating Data for Impact, centered around the White Paper we published last year with guidance for getting the right data in the right format to the right audiences. The format made for fascinating discussions and a tremendous learning experience (see key insights and videos below).

The presenters

We had three presenters that approached the topic from different sides: Deep Dhillon, CTO at Socrata, provided the "Meta Context" from Socrata's perspective of providing a platform for publishing large numbers of datasets. Noah Iliinsky, User Experience Expert at Amazon Web Services, provided specific advice on visualizing data. And yours truly focused on the different kinds of audiences and used specific examples from the Global Burden of Disease study to show how IHME is addressing the needs of different audiences (see the slides here). 

The format

The knowledge café format (or rather our version of it) turned out to be very useful for spirited discussions and a learning experience for everyone involved. We started with a plenary session with about 110 participants where co-author Nam-ho Park introduced the Communicating Data paper and the concept of a knowledge cafe. Each presenter then provided a 3-minute cliff hanger for their session. We split into 3 groups, and each presenter provided more detail on their topic (5-10 minutes), followed by an interactive group discussion (10-15 minutes). After each session, the speakers rotated, so every participant had a chance to see all three presenters and discuss each topic. So while I gave the same short presentation three times, the ensuing discussions were remarkably different. At the end, we reconvened the whole group for a final panel discussion and Q&A. All in all, the event ran for a little over 2 hours. A great experience for everyone involved; just check out the Twitter stream.

10 key insights

  1. Understanding your users' question(s) is crucial to identifying and designing the proper data communication tool. 
  2. Work with different formats to educate your audiences. An infographic may draw attention to your data with very few data points, interactive visualizations can encourage and enable people to explore underlying or contextual data to broaden understanding
  3. You can categorize your users into 4 key audiences: researchers, data analysts, data actors, casual users. More on that in the White Paper. Any individual can fall into different groups, depending on the data. A Minister of Health is a data actor for health data, but probably a casual user for sports results.
  4. An interactive data visualization can tell a story or enable users to explore and find stories. To drive change, you may want to use exploratory visualizations to find the relevant stories, then create your own visuals to drive home your points.
  5. Use all available channels to drive audiences to your data, including social media, media outreach, infographics, policy reports, conference attendence, etc. It is important that the users are pointed to relevant tools that fit the specific needs of their audience group.
  6. Data visualizations should rest on four pillars: purpose (why?), content (what?), structure (how?), formatting (the icing on the cake). Don't start at the end. More on that on Noah's blog.
  7. To improve data visualizations and other tools over time, analyze web metrics, conduct focus groups, encourage feedback from users, engage in conversations. Following best practices for visualizations is required but not necessarily sufficient to hit the nerve of the audience and make your visualization go viral. 
  8. When have you done enough analysis to present your data? Depends: In academia, you will always need to get to 99.9%. If 80:20 is enough, you can stop way earlier. In most cases: whenever your deadline is reached.
  9. Analyzing and communicating data also provides valuable insights into data gaps. As part of communicating data, you should point out what additional data should be collected to provide better answers.
  10. Quote of the day: stay away from "decision-based data making"


My talk


Noah Iliinsky's talk


Deep Dhillon's talk

More data sources for health research: health record and claims data

on Mon, 01/26/2015 - 17:03
How many detailed, individual-level health outcomes data at the patient level are currently accessible for research in the US? In a post earlier today, Bill Heisel provided useful guidance to journalists about being "Well Sourced: Strengthen your health reporting by using discharge data", pointing to the state discharge database in California. This reminded me of a list of data sources I have been meaning to post about patient-level data available for research. Below is a list of data sources; not all of them are free, but many of them have public use datasets that can be accessed without cost. Of course, public use datasets will have any direct identifiers removed, along with indirect identifiers as required by HIPAA. However, in many cases, limited use datasets will be available on application, providing more detailed information but also clear obligations as to what a researcher can and cannot do with the data.

Do you know of other relevant data sources? Leave a comment or tweet me at @peterspeyer.

Governmental sources

Other sources

Digital Health Days presentation: Measuring population health

on Wed, 08/27/2014 - 21:53

Earlier this week, I gave a presentation at the Digital Health Days in Stockholm, I gave a presentation on using data and visualizations to inform decisions on population health. The talk is now online (below). The slides are posted on Slideshare.

I was also interviewed by Mina Makar about the conference and Stockholm for the Dina & Mina Show; the interview is here.

Here is the talk:



If you want to learn more about Global Burden of Disease and implications, you can watch my TEDx talk at TEDxRainier last fall.

10 key takeaways from Health Datapalooza 2014

on Tue, 06/03/2014 - 15:08

Early June is Health Datapalooza time, and this year's event was again a whirlwind of energy, insights, and over 2,000 very motivated and passionate health data enthusiasts (health datapaloozers doesn't sound right). The hallway conversations provided tons of of pragmatic, productive and engaging conversations, inspired by keynotes and presentations from Todd Park, Bryan Sivak, Jeremy Hunt, Atul Gawande, Steve Case, Jerry Levin, Vinod Khosla, Kathleen Siblius, Dwayne Spradlin, Francis Collins, Fred Trotter, and many more (if you haven't heard of some of them, look them up, it's worth your time). Here are the key takeaways from the conversations: 

  1. Data liberation! US CTO Todd Park's famous battle cry still holds true. We need to turn more data from "passive into active data" (US Secretary of Health Sebelius) and make them more broadly available and used. There was some useful advice for asking for or opening up data: it's key to be clear on what you want to use the data for, indicate who will benefit from this work, work with the data owner/hoarder in a collaborative fashion, and be transparent via open source. You can also sometimes skip all that and just use FOIA (the Freedom Of Information Act).

  2. Silos: I knew silos were a problem in US healthcare. But health data are are spread across more places than I had imagined. As an example, Athena Health had to create connections to 110,000 (!) other systems for the 50M patients that they provide EHRs for.

  3. New data sources: more and more data are collected outside the health system, contributing to #2: quantified self, social media, purchasing, location data, etc. There are smaller sensors, cheaper devices, and a plethora of apps to help anyone track their health and lifestyle. There are few efforts to let individuals bring these data together in central platforms, and healthcare professionals typically don't (want to?) use this information.

  4. Patient in charge: patients can increasingly be in charge if they want to: they have better access to their data (150M Americans now have access to their health data via Blue Button), technology supports tracking of vital health information (see #3), and there are platforms that let patients bring (some of) those data in one place, enabling them to analyze and evaluate what works and doesn't work for them. UK Secretary of Health Jeremy Hunt put it best: the transparency of data reverses the relationship with the physician and puts the patient in charge.

  5. Services in the background: we are now in phase 3 of the Internet (says Steve Case); after building the internet (phase 1) and building apps and services on top of the internet (phase 2), the internet is currently being integrated into everyday life, enabling much more seamless experiences that incorporate data from apps, devices, health information etc. But this is still very much in its infancy. See #8 for a potential approach for a seamless experience.

  6. 20% doctor included: Machine learning is key for the future of medicine. Cognitive limitations and an overload of information mean that physicians' diagnoses and treatment recommendations are not overly consistent. However, machine + doc will be the winning combination; a decent chess computer with a decent (human) chess player will outperform the best chess computer. Make sure to read Vinod Khosla's paper 20% doctor included

  7. We heard of fabulous utopias where all data on an individual are in one place, enabling smart machine-learning algorithms to monitor vital signals, predict health risks and enable and encourage prevention. There are different options to get there (more on that in a future post), but none of them is a shoo-in.

  8. Walled gardens: the announcement of Apple's HealthKit created additional fodder for discussion about walled (health data) gardens; their collaboration with (walled) electronic health record provider EPIC makes that garden potentially bigger but not more open. However, there will be more opportunities to have health data from iOS apps contribute to the same central system, likely in a very user friendly way.

  9. It's the sickest patients who are driving cost. Our health and data collection systems are currently not designed for them, and new services like Patientslikeme are not integrated into healthcare delivery.

  10. Getting to the roots: the app demos and start-up showcase showed amazing progress we have made. While a flurry of apps at past events sometimes looked a bit gimmicky, this time there were several efforts that address issues at the health system level (have a look at some of the start-ups here).

A huge thank you to the Health Data Consortium and CEO Dwayne Spradlin for a fabulous conference. Looking forward to Health Datapalooza 2015, 5/31-6/3 in DC.

And if you missed it, we launched our new white paper on Communicating Data for Impact at the conference and got quite a bit of nice feedback. Enjoy the read!