Urban Wire Improving Official Statistics with Opt-In Disclosure Protection
Aaron R. Williams
Display Date

A close up of someone typing on a keyboard.

Every year, after months of survey distribution, data collection, cleaning, and formatting, statistical agencies like the US Census Bureau and Bureau of Labor Statistics take one more step before releasing official statistics: they intentionally add errors. Although counterintuitive at first glance, this process—called disclosure protection—retains data confidentiality for all respondents. The process also adversely affects data used to redistrict political power, report the official unemployment rate, and more.

Statistical agencies tend to use a one-size-fits-all approach for disclosure protection, assuming all respondents want maximum confidentiality. However, this approach comes with a steep cost for small groups. In particular, smaller race and ethnicity groups and rural communities tend to see their data accuracy diminish, because official data either don’t represent or completely misrepresent them. As a result, a rural community may not receive the funding it truly needs, or a Black community may end up with less political representation in their statehouse.

In our recent report, Opt-In Disclosure Protection, we explored how statistical agencies could shift to a framework where official statistics would blend unaltered responses for respondents who want the most accurate data and altered responses for respondents who want more disclosure protection. For example, leaders of an Indigenous community could encourage their neighbors to respond without disclosure protection to ensure the data accurately reflect the community. This framework could empower respondents to make their own disclosure protection decisions and improve the accuracy of official statistics.

Intentional errors and the cost of confidentiality

Although statistical agencies are tasked with releasing high-quality public data, they must legally and morally protect the confidentiality of the people and firms underlying the data. Consider the US Census Bureau, which publishes two of the most important sets of official statistics in the US: the decennial census and American Community Survey (ACS).

According to US Code (Section 9 of Title 13), the Census Bureau may not publish any information that could identify individuals nor allow any unauthorized people to review the collected data reports. In response to this law, the Census Bureau has adopted a number of strategies to protect confidentiality, including not releasing data at all, suppressing certain data, swapping attributes in the data, and truncating extreme data. The Census Bureau is also in the process of modernizing its disclosure protection methodologies with the adoption of differential privacy, which directly adds errors to statistics, and synthetic data, which directly adds noise to microdata with statistical models.

Disclosure protection isn’t costless, and evidence-based decisionmaking is only as good as its inputs. Intentional errors limit the effectiveness of governments, businesses, and nonprofits that use the data. Because of the nature of noise addition in the decennial census, errors can adversely affect statistics like segregation indices and affect the analyses of health disparities, health care access, and use among different subpopulations in the US.

Opt-in disclosure protection could reduce disclosure protection costs with little effect on respondents who still desire confidentiality. By abandoning a one-size-fits-all approach, statistical agencies can subject fewer people, households, and establishments to disclosure protection techniques, reducing the errors added to data and resulting in better evidence for decisionmaking.

Balancing data accuracy and disclosure protections

Statistical agencies and policymakers can consider the following strategies to promote data accuracy and align disclosure protection to the wants and needs of respondents.

  1. Pilot empowering respondents to make their own disclosure protection decisions. Statistical agencies can explore the legal and implementation challenges of opt-in disclosure protection by piloting the process on smaller projects. Our simulations suggest that blending unaltered records without personally identifiable information and pseudo-records for respondents who opt into disclosure protection improves usability without significantly increasing disclosure risks.
  2. Invest in developing disclosure protection methods. Current implementations of differential privacy and synthetic data can’t meet the data and disclosure protection needs of all official statistics. Further research is needed on the techniques and implementation of disclosure protection methods. Our simulations indicate that opt-in disclosure protection reduces racial differences in the accuracy of counts, but the overall noise is too large to be fit for use.
  3. Use privacy enhancing technologies to expand access to administrative data. Many questionnaire-based data collections like the decennial census and ACS can be augmented and improved with administrative data. Releasing disaggregated data with privacy protections will be key to augmenting these data collections and advancing racial equity.

By investigating the potential of opt-in disclosure protection, statistical agencies can continue to meet their obligation for confidentiality while more accurately identifying the needs of smaller demographic groups.

Body

Let’s build a future where everyone, everywhere has the opportunity and power to thrive

Urban is more determined than ever to partner with changemakers to unlock opportunities that give people across the country a fair shot at reaching their fullest potential. Invest in Urban to power this type of work.

DONATE

Research and Evidence Family and Financial Well-Being Housing and Communities Research to Action Equity and Community Impact Nonprofits and Philanthropy
Expertise Families Nonprofits and Philanthropy
Tags Community data use Data and technology capacity of nonprofits Evidence-based policy capacity Family and household data Nonprofit data and statistics Race, gender, class, and ethnicity
Related content