Researchers and organizations can increase privacy in datasets through methods such as aggregating, suppressing, or substituting random values. But these means of protecting individuals’ information do not always equally affect the groups of people represented in the data. A published dataset might ensure the privacy of people who make up the majority of the dataset but fail to ensure the privacy of those in smaller groups. Or, after undergoing alterations, the data may be more useful for learning about some groups more than others. Ultimately, how entities collect and share data can have varying effects on marginalized and underrepresented groups of people.
To understand the current state of ideas, we completed a literature review of equity-focused work in statistical data privacy (SDP) and conducted interviews with nine experts on privacy-preserving methods and data sharing. These experts include researchers and practitioners from academia, government, and industry sectors with diverse technical backgrounds. We asked about their experience implementing data privacy and confidentiality methods and how they define equity in the context of privacy, among other topics. We also created an illustrative example to highlight potential disparities that can result from applying SDP methods without an equitable workflow.
We view this guide as starting a conversation about developing a framework for advancing equity in SDP. We hope this guide helps readers ask the right questions as they pursue work in this field. Here, we summarize the ideas that can help advance equity in SDP.
Do Not Treat Equity as a Separate Field of Study
We see equity as integral to current SDP practice, and we believe the field should figure out how to balance equity along with privacy loss and utility. Considering equity in the context of SDP should be a part of any work on the topic, particularly in real-world implementations. Developing SDP methods to achieve equity after implementing them will not serve the community well, since SDP methods contain practical constraints. Pursuing equity throughout the data life cycle (i.e., from data collection to data publishing) will help advance ideas and communicate limitations.
Consider Literature and Perspectives from Fields outside Your Own
Data privacy is a wide and diverse field, and researchers and practitioners should draw on experiences outside their own training. The field can get better at communicating across and respecting different backgrounds. By familiarizing themselves with work and perspectives outside their discipline, researchers can increase the equity in their own field. They can go even further by engaging in collaborative efforts with researchers outside their discipline.
Estimate Separate Privacy Loss–Utility Curves for Groups
Practically speaking, SDP implementations can consider equity by explicitly defining the demographic groups in the data and publishing estimated privacy-utility curves for those groups. By making these curves public, data curators can help choose the tradeoff point between privacy and utility, as well as consider whether different tradeoffs are needed for different groups in the data.
Work with Groups Represented in Your Data
All the subject matter experts we interviewed agreed that understanding the privacy loss and statistical utility preferences for the groups represented in the data is essential for making the SDP process more equitable. Implementers of these methods should work to identify and partner with representatives of the groups who can help inform these decisions. This work involves communicating limitations and tradeoffs and working to find a solution that is acceptable to both decisionmakers and representatives.
There Is No Methodological Silver Bullet
Rather than assuming that certain methodologies will always result in equitable solutions, researchers and practitioners should seek to build equity thinking into all phases of their work. We interviewed researchers who work across the full range of SDP methodology, and none claimed that a single approach would solve these issues. Most solutions will involve implementing multiple data access approaches, such as using secure data enclaves and public data. However, we must note that even with a clearly defined approach to addressing equity in data privacy and utility, there will always be nuance and variation based on the context. This process requires working with those outside the SDP field to ensure the methodological approach is appropriate.
We hope the concepts we introduce in this guide will provide a starting point for practitioners and policymakers trying to understand the current privacy technology landscape from an equity lens.