Skip directly to content

Managing the Data Revolution

on Fri, 02/28/2014 - 14:40

Can you manage a revolution? In a 2013 reporta high-level panel at the UN called for a data revolution to "fully integrate statistics into decision making, promote open access to, and use of, data and ensured increased support for statistical systems." Today's workshop "Managing the Data Revolution" at UN Headquarters in New York City assembled speakers from the UN, central statistical organizations (CSOs), non-governmental organizations (NGOs), academia, civil society groups and the private sector. The discussion highlighted some key challenges and opportunities inherent in this data revolution, with a focus on the changing role of CSOs and government statisticians:

  1. There is a quickly increasing number of data sources and volume of data (sensor data, social media, transactions data, quantified self etc) that compete with official statistics.
  2. This data deluge - sometimes available in (near) real-time  - creates a competitive marketplace for data, providing opportunities for new data products, and prompting innovation.
  3. In this environment, CSOs should focus on what they do best and create partnerships for other tasks with private sector, academia, civil society, and NGOs (although not many examples of such partnerships came up today) and turn from owners of data into promoters of data. 
  4. To maximize the value and impact of data, and to meet increasing demand for data, every CSO should have an open data platform and - this was brought up multiple times - share microdata as open data.
  5. There is political commitment to open government and open data in governments around the world, which should empower CSOs to drive their open data agenda.

An interesting quote during the day about working with all these other data sources was "if Google can do it, if the NSA can do it, why can't statistical offices do it?" I suspect that limited resources and ethical considerations have something to do with that, but there is certainly ample opportunity for CSOs to innovate in terms of collecting and providing data, engaging with partners, and making data actionable.

If I have piqued your interest, here are some links to other write-ups and talks on the Data Revolution by UN Global Pulse Director Robert Kirkpatrick, PARIS21 Director Johannes Juetting, and the UN Global Pulse blogging team.

Encouraging the use of data from a brand-new tobacco study

on Wed, 01/08/2014 - 22:30

Yesterday, our team at the Institute for Health Metrics and Evaluation (IHME) launched the results of a comprehensive analysis of data on smoking prevalence and cigarette consumption around the world. The results are quite staggering. While smoking prevalence decreased in men and women between 1980 and 2012, the number of daily smokers increased by 41% in men and 7% in women, due to population growth. 

In 2012, people around the world smoked a total of over 6 trillion (!) cigarettes. The results are available by age, sex, country, and year with several metrics (smoking prevalence, cigarettes consumed, number of smokers). Quite a comprehensive dataset.

To encourage the use of these numbers, we published the data in different formats for different purposes and audiences:

  1. Peer-reviewed, scientific paper in the Journal of the American Medical Association (JAMA)
    The paper provides deep insight into the data and analytic methods used, and is mainly targeted at academic researchers and other analysts. In addition, the peer review provides assurance to all audiences that the data are scientifically sound and based on valid data..
  2. Comprehensive dataset for download on IHME's Global Health Data Exchange (GHDx)
    Provided as a comprehensive CSV file with all metrics and all dimensions, this dataset is useful for anyone who wants to use the results for further analysis or modeling, mostly academic researchers, modelers and analysts.
  3. Interactive data visualization on the IHME website
    The visualization (pictured above) explores all aspects of the paper. It provides a global perspective with international comparisons on a map, in a sunburst diagram, and via line charts. It also enables country deep-dives and a closer look at all the input datasets used for the analysis. Click through to the visualizatoin and see for yourself. Hopefully, policy and decision makers, funders, and many others will find the functionality useful to explore trends and patterns of smoking and cigarette consumption around the world, and devise ways to further decrease the prevalence of smoking to reduce the loss of health and lives to smoking.
  4. Infographic providing key insights
    A tobacco infographic focuses on global trends, rankings of countries with highest increase/decrease in smokers, and some specific examples. This graphic should capture the attention of just about anyone. After all, who knew that 6 trillion cigarettes can be smoked in one year?
  5. Encouraging the use of data
    The Roux Prize – a $100,000 award that I recently announced at TEDx Rainier -- was created to encourage the use of burden of disease data to improve the health of populations. You can read about how Australia used disease burden evidence to try to control tobacco there. Using these data to curb smoking around the world would certainly be a worthy cause.

In sum, hopefully everyone interested in smoking trends, reducing cigarette consumption and it's impact on health in general will find access to data that is useful to them. 

What about you? Did you find a format of the tobacco data that appealed to you? Do you have ideas for other ways to share the data? Or suggestions for improvement? Please leave a comment or send me a note via email or Twitter. I'd be happy to hear your ideas.

Global Burden of Disease data by country for download

on Wed, 09/04/2013 - 07:37

Our team at the Institute for Health Metrics and Evaluation (IHME) is now able to share the data from the Global Burden of Disease (GBD) 2010 study for download at the country level. First presented in a dedicated triple issue of the Lancet last December and made available in innovative data visualizations on the IHME website, the data can now be downloaded freely in three easily accessible places:

  1. GBD Compare - the flagship visualization for GBD results now has a "download" button that provides CSVs for any chart that is being viewed in the tool
  2. IHME's Global Health Data Exchange (GHDx) provides datasets for the cause and risk factor results for each of the 187 countries covered by GBD
  3. A new query-based data tool allows you to type a disease, injury, risk factor, country, age group, year, metric or other keyword to create a table and simple visualizations with the results

Please let us know if these tools are useful for you, and how we can further improve them. We are always looking for feedback and ideas for improvements.

Tell your global health data story in videos

on Fri, 03/08/2013 - 06:51

Data visualizations provide fabulous opportunities to make large amounts of data accessible. Sophisticated controls can allow users to work their way from high-level views into great degrees of detail. I wrote a few days ago about the role of visualizations in making the country-level results of the Global Burden of Disease (GBD) 2010 study accessible and useful. The GBD visualization tools make over one billion results accessible by choosing any combination of cause of disease or injury, risk factor, country, age, gender or year, and explore various metrics.

Visualization tools can also be used very effectively to tell stories. I would guess that the GBD data contain millions of stories worth telling. Videos can be a very effective way to share these stories. A recent article by Robert Kosara (@eagereyes) provides several examples for great data storytelling in video. Videos from public presentations can be very powerful, but simple screen grabs with tools like SnagIt let you tell your stories from the privacy of your own home (or office) and provide a much clearer view of the visualizations themselves.

Below, I'm adding four videos. The first shows Christopher Murray, Director of IHME and inventor of the concept of Global Burden of Disease, use the new visualization tools to explain GBD and show key findings from GBD 2010. In the second video, Bill Gates talks about the value of visualizations and provides great feedback on the GBD visualizations. The third is my first attempt to explain the functionality of the GBD flagship visualization, GBD Compare, in a video tutorial. And the fourth is a quick video that Tom Paulson from Humanosphere took when we talked about GBD and visualizations (see the resulting article here).

Have a look. And if you have been playing with visualizations, why don't you record your own stories, e.g. with the GBD visualization tools? Let me know about them, and I'll feature the best ones on this blog.

Christopher Murray using the GBD visualization tools to share findings


Bill Gates on GBD and the visualization tools


Tutorial for GBD Compare, the GBD flagship visualization


Quick intro to GBD Arrow Diagram

Visualizing Global Burden of Disease: behind the scenes

on Mon, 03/04/2013 - 16:50

Today, the Institute for Health Metrics and Evaluation (IHME, my employer) is launching 8 new interactive data visualizations that bring to life the results of the 5-year Global Burden of Disease (GBD) study at the country level. The GBD study compiled all available data on health outcomes for 187 countries in the world for 1990 and 2010, and provides estimates for the burden caused by different diseases and risk factors that are comparable across countries and over time. Regional results were published in a dedicated triple-issue of the Lancet in December 2012 (see my related post here). Managing the Data Team at IHME, I have been lucky enough to support the project with finding and managing data over the past 4 years, as well as overseeing the creation of these visualizations.

The data visualizations play a key role in the GBD project for several reasons. It started with IHME’s need to review the results of GBD. Tables and static graphs just don’t provide the flexibility to properly assess results and identify patterns and trends.

GBD uses four key metrics: number of deaths, years of live lost (YLL), years of life lost to disability (YLD), and disability adjusted life-years (DALY). The results datasets are massive, broken down by several dimensions:

  • 291 causes of disease and injuries at the most granular end of a 5-level cause hierarchy
  • 66 risk factors
  • 1100 cause-risk factor attributions (i.e. burden caused by a given risk factor via a particular disease or injury)
  • 187 countries, 21 GBD regions, global
  • 27 age groups: early neonatal, late neonatal, post neonatal, 1-4 years, 5-9, 10-14 and so on until 75-79, 80+, as well as under 5, 5-14, 15-49, 50-69, 70+, all ages, and age-standardized
  • Male, female, both
  • 3 years: 1990, 2005, 2010
  • Estimates expressed as total number, rate, and %, as well as ranked by country
  • 95% uncertainty intervals: lower bound, mean, and upper bound (not strictly a dimensions but adds to the size of the database)

In total, about 1 billion (!) results were calculated for the project, and then there are aggregations by cause, age, and geography. A nightmare to review, but a gold mine for visualizations. The results datasets are fully imputed for all dimensions, i.e. there are no gaps in the datasets. And consistent use of methods ensure comparability of results across all dimensions.

Initially, we tried off-the-shelf visualization tools, but they didn’t give us the flexibility to dive into all the dimensions and properly explore patterns and trends in the data. Then we discovered D3.js (Data-Driven Documents). D3 is a JavaScript library for manipulating documents based on data; it allows developers to build powerful visualizations very efficiently (but you be the judge about how powerful our resulting visualizations really are). And we did what recommended in a blog post today: iterate early, iterate often.

We improved the tools as we reviewed our results, then started using the tools to show the results to collaborators and country experts to obtain feedback, review our estimates, and discuss what data were used for analysis (and what data may be available to further inform and improve the estimates). Realizing how powerful these tools are for different audiences to explore the results of GBD, we decided to make them publicly available. In December 2012, we launched 5 visualization tools with the regional results of GBD (available here) with the publication of the GBD papers in The Lancet.

Updates for these tools are now available with country-level results. In addition, we created three new tools that allow users to review and explore the data from completely new angles. Here is a quick overview of the country-level visualizations:

  • GBD Compare is a powerful platform that visualizes the data in treemaps, maps, time plots, age plots and stacked bar charts. The most powerful feature is the 2-panel view that allows users to review any two of these charts simultaneously to compare and review trends across causes, risks, countries, ages etc. The panels are interactive, e.g. the map can be used to select countries in the other panel and quickly explore countries around the world. It’s a powerful tool, but requires a bit of commitment to make use of all the features. My video tutorial for GBD Compare can be found here.
  • GBD Cause Patterns provides results for 21 cause groups in stacked column charts. It allows quick exploration of trends across geographies, ages, gender and time (see options at the bottom of the screen).
  • GBD Arrow Diagram shows very concisely the rank of causes and risks for a given country or region in 1990 and 2010, along with the related growth trend. The connecting arrows quickly show how fast causes and risks have grown or decreased between 1990 and 2010. A version of the GBD Arrow Diagram is embedded below.
  • GBD Heatmap ranks causes and risks by burden within a country, but then allows comparisons of those ranks across countries and/or regions (you can compare the ranks within a country with the ranks for a given region or the world).
  • GBD Uncertainty Visualization allows users to compare uncertainty bounds across causes and risks for all dimensions. Countries or causes/risks where the data were more sparse or inconsistent will have wide uncertainty intervals.
  • HALE/LE Visualizations shows the relationship between total life expectancy and healthy life expectancy, i.e. the number of years people can expect to spend in good health over their lifetime.
  • Mortality Visualization provides an interesting addition to the results: users can look at all-cause mortality estimates and uncertainty bounds in the context of the underlying input data points. The hovers provide detailed metadata about the source of the data point.
  • COD Visualization show the input data points for cause of death data by country, cause, and sex, also with detailed metadata.

All visualizations also feature “share” functionality that creates a unique URL for the chosen settings that can be shared via email, Twitter, Facebook or other social media. This should be useful to bring up the tools in online conversations about the health situation in different countries, disease patterns and international comparisons.

These tools will be used extensively in policy and country consultations, and many of these conversations will be conducted in locations that have less than reliable internet connections. To facilitate use, we created offline versions of these tools as well. The sheer size of the data provided a substantial challenge, but the tools are now performing well offline.

If you are interested in building additional visualizations with the GBD results, you should start with the regional results of GBD, all available for download on the GHDx here. The country-level results will be made available via the GHDx in September 2013.

I would love get your feedback on your experience with using the visualizations. Are they intuitive? Are there features that you like or don’t like? Are there things you would like to see or do with the data that aren’t possible yet? Leave suggestions in the comments, and I will make sure to include them in our discussions for future development


Example: GBD Arrow Diagram