To Advance Racial Equity, Releasing Disaggregated Data while Protecting Privacy Will Be Key
Although the Centers for Disease Control and Prevention declared COVID-19 hits communities of color the hardest—and extensive evidence backs this up—communities of color are not receiving the aid they need. Health officials say the lack of disaggregated data by race and ethnicity has affected timely vaccine distribution and proper messaging for correcting misinformation to help those communities.
This situation highlights one of many challenges President Biden’s day-one executive order on racial equity aims to resolve. If federal policymakers want to address racial disparities, they should collect and release detailed, disaggregated data. But they must also carefully consider the unintended harms they could cause to the people they are trying to help.
There has been a longstanding debate over protecting personal data versus supporting the common good. In particular, people of color with low incomes are more susceptible to privacy attacks because of their higher reliance on smartphones for internet access and how much personal information they give up for free cell phone app services. This information collection makes them more easily identifiable, especially if they are outliers in small geographies.
Urban Institute experts offer data tools and rigorous analysis strategies that can help advance the goals of this executive order. In addition to our tools, federal policymakers can consider new methods for understanding disclosure risk and safely releasing data for research.
Why should federal agencies be concerned?
Data usefulness often conflicts with data privacy and confidentiality. The more data agencies release, the more challenging it is for the agency to guarantee confidentiality in the data. Traditionally, federal agencies, such as the Internal Revenue Service (IRS), have limited this risk by restricting access to data to a small set of “trusted” researchers, suppressing and aggregating observations and swapping values, or simply not releasing data at all. In these cases, stringent data privacy restrictions can significantly reduce data usefulness and accuracy, especially for Black, Indigenous, and people of color in rural areas.
The challenge originates from how to “hide” individuals in the data if they are unique within a small population. To protect these people, data privacy and confidentiality methods either completely distort the representation of those individuals or remove them entirely from the data (PDF). This can lead to a cultural disconnect between people using the data and the populations they’re trying to reach.
For instance, initial pandemic response involved messages to wash hands for 20 seconds. However, Navajo Nation, where more than one-third of the population has no running water, couldn’t implement this advice. In May 2020, Navajo Nation suffered more cases of COVID-19 per capita than New York City.
How can federal agencies determine the balance?
When easing confidentiality protections, federal agencies should consider the following to ensure communities most vulnerable to privacy attacks don’t become more at risk.
- Build on past and current technological innovations on expanding access to highly confidential data. Urban is developing a privacy-preserving methodology in collaboration with the IRS Statistics of Income Division that generates synthetic data (fake data that are statistically representative of the confidential data) and a validation server (an interactive system where researchers can submit statistical analyses without seeing the data). These methods are expanding access to confidential administrative tax data while guaranteeing data privacy protection and could be extended to other confidential administrative data sources in the future.
- Demonstrate and educate the federal agencies on these technological innovations. Many federal agencies lack educational materials and computational resources to implement modern data privacy and confidentiality methods on their own data.
For example, the Bureau of Labor Statistics (BLS) continues to use older methods, which results in suppressing more than 60 percent of cells in the Quarterly Census of Employment and Wages data because of confidentiality concerns surrounding the small counts of businesses and employees within certain communities and industries. The suppression is concentrated in rural communities.
As part of a pilot study, Urban created documentation and open code when applying modern data privacy methods on rural counties’ economic data for the BLS. Urban demonstrated that data quality could be improved while maintaining rigorous confidentiality protection, but further testing and expanding educational resources would be needed to broaden the methodology.
- Engage with underrepresented communities with culturally relevant outreach. As we learned with Navajo Nation, underrepresented communities must be a part of the conversation to assess if the societal benefits outweigh the privacy risks. To address this, the Racial Equity Analytics Lab is exploring the ability to impute and add race and ethnicity information to credit bureau data. Through the process, they have engaged with several community partners, such as the MetroLab Network and New York City Department of Consumer and Worker Protection, and hosted a workshop on the ethics of imputing race and ethnicity to ensure they fully consider the potential ethical risks of imputation.
This post was revised to correct the name of the New York City Department of Consumer and Worker Protection (corrected 3/3/2021).
(Westend61 / Getty Images)