Skip directly to content

Data Revolution in Africa .. some light background reading

on Thu, 03/26/2015 - 12:37

Over the next 5 days, the 2015 Conference of Ministers, hosted by the African Union and the UN Economic Commission for Africa, will include some indepth discussions in a side event on how to make the Data Revolution work for African countries. Whether you are participating in person, following on Twitter, or are interested in the topic in general, here is some "'light" background reading

Follow updates on Twitter via #DataRev, #DataRevAfrica, and #CoMAfrica2015, or follow yours truly at @peterpeyer.

Open Data opportunities along the Data Value Chain

on Sun, 02/22/2015 - 08:31

It's a great weekend for Open Data enthusiasts. Activities around Open Data Day span the globe, and the discussion on social media like Twitter is insightful and stimulating. Anyone working with data has an interest in sharing data with others. Most of the creativity or knowledge that can be applied to your data sit outside your organization. Releasing data for others to use will dramatically increase your data's impact. In the case of health, sharing data will help save lives by informing research, policiies and decisions to improve prevention of diseases and delivery of healthcare.

There are many opportunities for releasing data as open data. If you are running research projects, you probably hit many or all steps along the Data Value Chain. The table below shows opportunities to release data on each of the 10 steps of a generic Data Value Chain (download the high res version by clicking on the image below). There are certainly many more opportunities. Send suggestions via Twitter or the contact form, and I'll update the table.

Knowledge Café on Communicating Data for Impact

on Wed, 02/11/2015 - 16:27

What do you learn when 110 people discuss data and communications in a knowledge café? Quite a lot. On Tuesday, IHME and Forum One hosted an event on Communicating Data for Impact, centered around the White Paper we published last year with guidance for getting the right data in the right format to the right audiences. The format made for fascinating discussions and a tremendous learning experience (see key insights and videos below).

The presenters

We had three presenters that approached the topic from different sides: Deep Dhillon, CTO at Socrata, provided the "Meta Context" from Socrata's perspective of providing a platform for publishing large numbers of datasets. Noah Iliinsky, User Experience Expert at Amazon Web Services, provided specific advice on visualizing data. And yours truly focused on the different kinds of audiences and used specific examples from the Global Burden of Disease study to show how IHME is addressing the needs of different audiences (see the slides here). 

The format

The knowledge café format (or rather our version of it) turned out to be very useful for spirited discussions and a learning experience for everyone involved. We started with a plenary session with about 110 participants where co-author Nam-ho Park introduced the Communicating Data paper and the concept of a knowledge cafe. Each presenter then provided a 3-minute cliff hanger for their session. We split into 3 groups, and each presenter provided more detail on their topic (5-10 minutes), followed by an interactive group discussion (10-15 minutes). After each session, the speakers rotated, so every participant had a chance to see all three presenters and discuss each topic. So while I gave the same short presentation three times, the ensuing discussions were remarkably different. At the end, we reconvened the whole group for a final panel discussion and Q&A. All in all, the event ran for a little over 2 hours. A great experience for everyone involved; just check out the Twitter stream.

10 key insights

  1. Understanding your users' question(s) is crucial to identifying and designing the proper data communication tool. 
  2. Work with different formats to educate your audiences. An infographic may draw attention to your data with very few data points, interactive visualizations can encourage and enable people to explore underlying or contextual data to broaden understanding
  3. You can categorize your users into 4 key audiences: researchers, data analysts, data actors, casual users. More on that in the White Paper. Any individual can fall into different groups, depending on the data. A Minister of Health is a data actor for health data, but probably a casual user for sports results.
  4. An interactive data visualization can tell a story or enable users to explore and find stories. To drive change, you may want to use exploratory visualizations to find the relevant stories, then create your own visuals to drive home your points.
  5. Use all available channels to drive audiences to your data, including social media, media outreach, infographics, policy reports, conference attendence, etc. It is important that the users are pointed to relevant tools that fit the specific needs of their audience group.
  6. Data visualizations should rest on four pillars: purpose (why?), content (what?), structure (how?), formatting (the icing on the cake). Don't start at the end. More on that on Noah's blog.
  7. To improve data visualizations and other tools over time, analyze web metrics, conduct focus groups, encourage feedback from users, engage in conversations. Following best practices for visualizations is required but not necessarily sufficient to hit the nerve of the audience and make your visualization go viral. 
  8. When have you done enough analysis to present your data? Depends: In academia, you will always need to get to 99.9%. If 80:20 is enough, you can stop way earlier. In most cases: whenever your deadline is reached.
  9. Analyzing and communicating data also provides valuable insights into data gaps. As part of communicating data, you should point out what additional data should be collected to provide better answers.
  10. Quote of the day: stay away from "decision-based data making"


My talk


Noah Iliinsky's talk


Deep Dhillon's talk

More data sources for health research: health record and claims data

on Mon, 01/26/2015 - 17:03
How many detailed, individual-level health outcomes data at the patient level are currently accessible for research in the US? In a post earlier today, Bill Heisel provided useful guidance to journalists about being "Well Sourced: Strengthen your health reporting by using discharge data", pointing to the state discharge database in California. This reminded me of a list of data sources I have been meaning to post about patient-level data available for research. Below is a list of data sources; not all of them are free, but many of them have public use datasets that can be accessed without cost. Of course, public use datasets will have any direct identifiers removed, along with indirect identifiers as required by HIPAA. However, in many cases, limited use datasets will be available on application, providing more detailed information but also clear obligations as to what a researcher can and cannot do with the data.

Do you know of other relevant data sources? Leave a comment or tweet me at @peterspeyer.

Governmental sources

Other sources

Digital Health Days presentation: Measuring population health

on Wed, 08/27/2014 - 21:53

Earlier this week, I gave a presentation at the Digital Health Days in Stockholm, I gave a presentation on using data and visualizations to inform decisions on population health. The talk is now online (below). The slides are posted on Slideshare.

I was also interviewed by Mina Makar about the conference and Stockholm for the Dina & Mina Show; the interview is here.

Here is the talk:



If you want to learn more about Global Burden of Disease and implications, you can watch my TEDx talk at TEDxRainier last fall.