Data stewards intentionally add errors to published microdata and statistics to protect confidentiality. These errors have a disproportionate impact on the accuracy of information about small groups, including racial and ethnic minorities, and rural communities.
This study presents an opt-in privacy framework, ethical and legal considerations for an opt-in privacy framework, and early evidence of the impact of such a framework.
Why This Matters
People should be empowered to decide how their data are accessed and used while they browse the web, interact with government services, and respond to questionnaires. Currently, disclosure protection policies at the US Census Bureau do not allow for such empowerment, and respondents to questionnaires like the decennial census and the American Community Survey are all subjected to disclosure protections. By default, data for all respondents are masked with data confidentiality methods, and those who wish to accurately reflect themselves in the data cannot do so. This one size fits all approach limits the quality of public data.
What we Found
- Simulations of opt-in disclosure protection show reduced errors for groups that opt in to disclosure protections at lower rates. The Census Bureau and community organizations could provide outreach strategies to these groups under an opt-in framework to improve representation in the data by enabling these groups to accurately reflect their data if they so choose.
- Our first case study explored applying local differential privacy to the decennial census. Although the opt-in framework shows smaller error with lower rates of opt in relative to higher rates of opt in, the overall level of error is still much higher than the centralized differential privacy alternative.
- Major methodological breakthroughs for local differential privacy would be needed for opt-in disclosure protections to improve the utility-disclosure risk trade-off for the decennial census.
- Our second case study created a fully synthetic version of the American Community Survey. In all cases, low levels of opt in improve the utility of the synthetic data without worsening disclosure risk metrics. Opt-in data synthesis was particularly helpful to these impacted groups and mitigated the consequences of errors from the synthesis methodology.
How we did it
We present two demonstration studies using two different statistical disclosure control approaches. The first uses an opt-in local differential privacy approach for the decennial census, and the second uses an opt-in synthetic data approach for the American Community Survey. In both cases, we seek to explore the impact of varying the rate of opting into disclosure protections on data quality and the associated privacy consequences. We especially examine the impact for small racial and ethnic groups, including the impact on quality and privacy, if some groups opt in at higher rates than others.