A New York Times Opinion article (sent to me by multiple colleagues) follows Steven Dillingham, the director of the US Census Bureau, as he meets with villagers in Toksook Bay, Alaska, for the 2020 Decennial Census.
The article discusses the Census Bureau’s efforts to count Native Americans and how its new data privacy method—the top-down differential privacy algorithm (PDF)—could drastically alter its count of small populations.
This change could affect voting district lines, natural disaster plans, and how the federal government will allocate its $1.5 trillion budget.
How serious is this issue?
The article’s bleak depiction of small towns “disappearing” hits home for me.
I grew up in a very remote town in Idaho called Salmon that has a population of roughly 3,100 people, the largest town in a county roughly the size of Connecticut. While living in Salmon, I experienced firsthand how precious federal funding is for small communities. During my high school years, the Salmon School District changed from five-day to four-day scheduling to save money on utilities.
I am also a data privacy researcher. I have dedicated my career to developing methods that create statistically representative data while preserving people’s privacy in the data for public release. That’s why my colleagues texted, messaged, emailed, and tweeted me that New York Times article, asking me…
How do we balance conflicting needs for privacy and accuracy?
The answer is complicated. I’ll sketch out an example to help illustrate the issues that must be navigated.
Say I participated in a census survey. I want to keep my identity private, but my accurate representation in the data is also important. I am a millennial Asian woman currently residing in Washington, DC.
Because of the size and racial diversity of Washington, DC, the Census Bureau can easily “hide” me in the data while also preserving certain statistical qualities: that there are already many millennial, Asian American women in Washington, DC.
Issues of privacy quickly arise in areas with much smaller populations. “Hiding” me in Salmon, where I was the only Asian high schooler, is much harder. To keep my identity hidden, a data privacy method could erase me from the picture of Salmon, Idaho.
This shows how population counts in small towns can easily be distorted by data privacy methods. Data cannot have perfect privacy and perfect utility.
Privacy protection must be lowered to have higher utility and vice versa. In the case of hiding me in Salmon, if I didn’t care that people knew I came from this small town, the Census Bureau could lower privacy barriers to provide more accurate statistics.
Why doesn’t the Census Bureau lower privacy protection?
There are still situations when knowing there are people with certain demographics in an area can violate privacy, both ethically and legally.
On the ethical side, many Asian Americans (specifically those of Japanese descent) might not be comfortable with people knowing many Asian Americans live in a particular town, considering the lingering legacy of internment camps during World War II.
Legally, the US Census Bureau is bound under Title 13 of the US Code to “provide strong protection for the information [that the Census] collect[s] from individuals and businesses.”
Why doesn’t the Census Bureau use the same methods they used in the 2010 Census?
The significant evolution of our technological landscape has made the already difficult problem of balancing privacy and utility even more challenging. The tiny computers that fit in our pockets (smart phones) have more computational power than the average desktop had in 2010.
This change in data infrastructure and computational power has made it possible to reidentify individuals in the 2010 Decennial Census, using publicly available data. However, many researchers, public policymakers, and other data practitioners who use Census data are not prepared for the new data privacy method applied to the 2020 Census and, potentially, other census data products.
Some researchers are advocating for the Census Bureau to revert to the 2010 data privacy method—data swapping (PDF)—whereas those at the Census Bureau and other data privacy researchers are committed to deploying the top-down algorithm and other differentially private publication methods.
How are people addressing this issue?
At the Urban Institute, we are actively working and conversing with both sides of the debate to create accessible communication materials on the trade-offs between data utility and privacy. With these materials, we hope the data user and data privacy community will have a more constructive public dialogue on how to best set this balance.
Our ultimate goal is for this open communication to increase the quality and quantity of publicly available data while minimizing the risk to people’s privacy and, consequentially, elevate the debate by empowering policymakers to make more data-informed and evidence-based public policy decisions.
In the meantime, we encourage data users to learn more about these new data privacy methods and engage in constructive conversations with the US Census Bureau and other data privacy researchers about their data privacy needs.