Considering Ethics and Empathy in Imputing Race and Ethnicity to Expand the Availability of Disaggregated Data
Disaggregating data by race and ethnicity is a critical tool for exposing racialized systems of privilege and oppression. But, despite strong ethical and practical reasons for disaggregating data this way, many high-value datasets lack sufficient information on race and ethnicity. Credit bureau data, for example, lack race and ethnicity information, which has inhibited efforts to examine how credit scores affect the racial homeownership gaps.
In response, data scientists and researchers have developed and are expanding creative methods, such as imputation, which uses probabilities derived from a secondary dataset to append race and ethnicity data. With these expanded datasets, policymakers can disaggregate data and track racial disparities to inform policymaking.
These methods are compelling alternatives to collecting original, self-reported data on people’s race and ethnicity when those original data are not feasible to collect. But these alternative methods do not typically require the input of the people whose data are being combined or augmented. Leaving their perspectives out of data improvement efforts not only creates ethical risks and lacks empathy for the people whose data are used, but, without those perspectives in the analysis, it also leaves the imputed data less accurate.
We recently released a series of documents that highlight the risks inherent in this work, which extend beyond methodological errors to concerns around privacy, consent, and the excluded input of the people and communities who would be affected by imputed analyses. Through a scan of the literature, engagement with experts, and a case study where we appended race and ethnicity information to a dataset, we articulate some initial recommendationsfor imputing race and ethnicity data with more empathy and greater regard for ethical risks.
Findings from the field
We scanned relevant literature and interviewed eight key practitioners to arrive at the following findings:
- Imputing race and ethnicity has important benefits, especially when the risks are considered and addressed. Many researchers, policymakers, advocates, and philanthropists are excited to invest in analytic tools like imputation to bolster and expand racial equity analyses. But to harness the many benefits of disaggregated data, researchers must consider concerns about how these methods could harm people and communities, especially if employed without accountability.
- Ethical best practices are underdeveloped amid focus on technical application. Current literature on imputation focuses largely on technical questions and techniques, often failing to position those techniques within a broader ethical framework. This project aims to fill critical gaps in ethical guidance for those seeking to use imputation to disaggregate data.
- Imprecision produces disparate benefit and risk across subgroups. We found concerns about how accurately imputation represents racial distributions for less populous or more widely dispersed subgroups, such as many of those in the Asian American and Pacific Islander communities. Because imputation must draw on other disaggregated data to develop estimates, some policy questions may be more challenging to pursue if data are insufficient.
- Empathy is a critical but often missing element. Engaging and collaborating with people with lived experience is a core element of equity and empathy. If approached robustly, authentically, and with respect for history and culture, it can help return ownership and control of data and findings to those whose lives are reflected in the data, improve the accuracy of the analysis, and hone the effectiveness of evidence-driven policy recommendations.
Lessons learned from our own imputation
To better understand and mitigate potential risks from not incorporating equity into the imputation process, we imputed race and ethnicity onto nationally representative sample of credit bureau data, which does not collect or contain information on race and ethnicity. From this case study, we distilled three “ethical checkpoints” where we examine our source datasets, our imputation methodology, and the resulting race and ethnicity imputations for potential racial bias and inaccuracy.
These three checkpoints are intended to identify potential bias, mitigate the bias when possible, and transparently communicate how any remaining bias limits ethical uses of the resulting data. At each checkpoint, users evaluate whether to proceed with or terminate the imputation process, weighing the potential harms of unmitigated bias against those of not having imputed disaggregated data.
- Checkpoint 1: Before imputation, audit data for bias.
- Checkpoint 2: During each step of imputation, examine where bias could be introduced.
- Checkpoint 3: After imputation, assess whether imputed race and ethnicity data are accurate enough to be used ethically for your analytic purpose.
By following our checkpoints, we learned that equity must be considered in every decision, differential outcomes must be examined by race and ethnicity, analytic decisions and data limitations must be transparently communicated, and the fitness for purpose of the imputed data must be examined in light of their intended use.
Recommended standards for imputing with more empathy and ethical concern
Using lessons learned through our landscape scan, interviews with experts, and our own imputation case study, we found that to effectively impute race and ethnicity in ways that lead to greater benefits and diminished harm for people and communities of color, analyses must meet the following criteria:
- relevant to the expressed needs of the community
- interpretable to key stakeholders
- accurately representative of the communities whose information is being imputed
- privacy preserving
- undertaken with clear accountability, concerning both methodological quality control and community harm and benefit
Stakeholders must incorporate community-engaged methods into any imputation project, drawing on insights from affected communities to inform the use, design, and application of imputation to create disaggregated data for their community. Integrating community engagement strengthens each of these standards to ensure imputed data are developed and deployed in ways that progress racial equity.
To address racial disparities, changemakers must be able to identify them, but across many important policy questions, data disaggregated by race and ethnicity are not widely or consistently available. If researchers and data analysts ask themselves the questions posed here and apply these standards to their own work, we believe imputation can be an effective, ethical, and empathetic tool to address critical gaps in race and ethnicity data.
The Urban Institute has the evidence to show what it will take to create a society where everyone has a fair shot at achieving their vision of success.
(Klaus Vedfelt/Getty Images)