Central Statistical Organizations - still emerging from the pre-Internet data publishing world
Health Data Innovation is back from a long holiday hiatus, which among other adventures led me to the 3rd International Arab Statisticians Conference in Amman, Jordan. A recurring topic at the conference was that many Central Statistical Organizations (CSOs) are focusing too much on collecting data while neglecting to enable and encourage broad use of those data. This is true for CSOs around the world, not just in Arab countries. The most common form of online data sharing are PDF or Excel files with tabulated data, roughly equivalent to the publication of printed reports and yearbooks. Reports and tables are very useful, but can't replace access to detailed or microdata for researchers and other serious numbers geeks.
A 2010 study by the US Federal Reserve Board, published in the OECD Statistics Newsletter, explored data sharing practices of statistical organizations. It evaluated the online offerings of 193 CSOs with some very interesting findings:
- HTML (71%) and PDF (64%) are the most prevalent forms of data distribution (which confirms my own experience; too many organizations simply put reports online that are traditionally published on paper)
- Use of Excel (55%) far outweighs the use of csv (17%) or txt (2%), even though it is a proprietary format
- Only 9 CSOs (5%) use interactive graphics, 58% don't provide graphics at all (except as part of reports)
- Only 21% of CSOs enable users to customize downloads, while the vast majority offers predefined documents for download. 21% don't provide any download functionality at all.
There is a growing number of web tools with useful functionality to make data available, engaging and even fun. Free tools like Tableau Public, ArcGIS.com, Google Motion Charts, and others offer great possibilities to share and visualize data, mostly not requiring much coding or developer knowledge. More comprehensive solutions like Socrata, Space-Time Research, or Tableau Server offer far more sophisticated possibilities for publishing, visualizing and exploring data. For a more comprehensive list of tools, check the Resources & Tools section. CSOs need to make use of these opportunities.
The study didn't explore the granularity of the data offered, i.e. whether the CSO is sharing microdata, detailed tabulations, simple tabulations, or only estimates. I spend a fair amount of time looking at CSO websites and talking to people at CSOs while searching for health-related data. Many organizations are hesitant to provide access to detailed or even microdata, mostly citing confidentiality reasons (for more background reading on motivations to share or not to share data, check here). This is another lost opportunity. Academic and other researchers need microdata to unleash the full power of statistical tools on the data and maximize the insight gained from them. There are three ways to deal with the problem. CSOs can ask data users to sign data use agreements that ensure proper and secure storage and use of the data. They can ask researchers to come to their offices or dedicated research data centers. Or they can use software tools to ensure confidentiality. Software from Space-Time Research uses a combination of techniques to enable work with micro-data while ensuring that any viewable / downloadable results are fully de-identified.
The discussions in Amman showed that CSOs across the Arab world are aware of the problem. Now it's time to work on the solution.