Last week, Sen. Patty Murray (D-WA) and Rep. Paul Ryan (R-WI) introduced sensible, bipartisan legislation that could improve federal policy by expanding the availability of data that can fuel actionable research.
This is great news. As policy researchers, we believe in the power of evidence to strengthen public policies. But, too often, our work is stymied when we can't gain access to essential data—data collected and controlled by the federal government.
So we look forward to the day when researchers can deploy federal data resources to produce new evidence about the tough challenges facing our country and what works to tackle them. Here we offer eight recommendations to help the Ryan-Murray commission with its charge to design a clearinghouse of federal data:
- Build on the work by Data.gov, but move it forward by hosting data that researchers (both inside and outside of government) can access. Data.gov provides a useful warehouse of metadata (detailed descriptions of government data sets), but it doesn't provide direct access to the actual data researchers need.
- Work closely with federal agencies to help them inventory their existing data and understand researchers’ data needs. A central organizing group, agency, or data clearinghouse should provide ongoing expertise and advice to agencies as they create, process, and analyze data. This central group or clearinghouse should encourage agencies to open up their data holdings to outside researchers, who can bring a diversity of perspectives and analytic tools to bear.
- Provide tools for matching data from multiple sources. This is critical, because no single data set contains all the information necessary to fully explore policy questions or evaluate policy impacts. For example, the Longitudinal Employer-Household Dynamics data set from the Census Bureau combines state-level administrative Unemployment Insurance records into one primary file. Imagine merging that file with state-level administrative records on participation in the Supplemental Nutrition Assistance Program or federal-level information on the Disability Insurance program. With this kind of linked data, researchers would be better able to understand people’s behaviors, incentives, and needs for these different programs.
- Apply state-of-the-art strategies for protecting privacy and security. These protections are critical and challenging to ensure, but new techniques are emerging to overcome these challenges. For example, researchers at Cornell University have worked with the Census Bureau to create a “synthetic” data file that pulls together administrative earnings records from the Social Security Administration and information from the Survey of Income and Program Participation.
- Offer state-of-the art data hosting, data platform, and data visualization tools. It is not enough to simply pull data sets together; they should be processed and published in a way that people can actually use. Multiple data output sources, such as APIs, JSON, and CSV, should all be considered.
- Provide high-quality data documentation about sources, survey fundamentals, variable definitions, and response options. This documentation should be explicit, thorough, and harmonized in order to help researchers understand and effectively use the data.
- Establish clear application and usage protocols, so that legitimate researchers from many different types of institutions can gain access and use the data responsibly. In addition, invite users to request or recommend additional features, new data files, and alternative survey methods. Promoting a community where data users can share their experiences and expertise and be encouraged to request features and additional data will result in better and more useful data sources.
- Consider incorporating state- and local-level data as part of the clearinghouse, especially where these data describe participation in federal programs and are collected according to standardized, national protocols.
Building a federal data clearinghouse that adds real value while also protecting individual privacy and data security will be a challenging undertaking. But it's worth it. We have so much to learn about what works and how to make public policies better for people and communities.
Photo: Representative Paul Ryan (R-WI) and Senator Patty Murray (D-WA). (AP Photo/J. Scott Applewhite)