A growing set of methods from data science and statistics could fill critical gaps in race and ethnicity data by matching, imputing, or otherwise adding demographic and locational characteristics to existing datasets. As the potential for appending race and ethnicity variables grows, however, so does the risk of ethical violations and potential harm to Black, Indigenous, and other people of color. In November 2020, the Urban Institute’s Racial Equity Analytics Lab and Office of Technology and Data Science convened experts from the data science, government, racial justice, and data privacy fields to discuss the ethics of using advanced statistical methods to fill gaps in race and ethnicity data. This brief summarizes five ethical risk areas that surfaced during the workshop, including: 1) excluding people and communities of color from ownership of their data and from decisions on research process and methods; 2) violating individual informed consent; 3) compromising individual privacy or confidentiality; 4) producing inaccurate estimates and misleading conclusions; and 5) generating data for purposes that harm people or communities of color.
Workshop Findings on the Ethics of Data Imputation and Related Methods