Although collecting more and better data can provide great benefits to society, such as furthering medical research or targeting investments to those most in need, data privacy concerns surface from those charged with protecting data when that information can be de-anonymized and used maliciously.
For example, the US Census Bureau conducted a simulated attack on the 2010 Decennial Census and discovered they could reidentify about one-sixth of the US population using publicly available data (such as name, sex, and age) from external sources, like public social media profiles (Leclerc 2019). This type of attack on the 2020 Decennial Census has the potential to be even more disclosive because of the detailed information collected, such as more race and ethnicity categories, that could lead to more individuals being identified with great specificity. The reconstruction attack results and the more detailed information available in the decennial census motivated the Census Bureau to update their Disclosure Avoidance System (DAS) from traditional statistical disclosure control methods to a formally private method—the TopDown Algorithm—for the 2020 Decennial Census.
However, this drastic change in how data privacy and confidentiality was defined for the 2020 DAS caused significant friction between the US Census Bureau and census data users. For instance, leaders from states, counties, cities, and towns rely on census data for school planning, budgeting, social program provisions, redistricting, revenue sharing, and a multitude of other statutory requirements. These data users want more accurate data at granular geographic areas and fear that the updated DAS will lead to incorrect public policy decisions.
This explainer aims to help readers better understand what formal privacy is and how the TopDown Algorithm works. The explainer is also a continuation of “Personal Privacy and the Public Good: Balancing Data Privacy and Data Utility” (Bowen 2021) and we encourage readers to read that report first.