A Synthetic Supplemental Public-Use File of Low-Income Information Return Data: Methodology, Utility, and Privacy Implications

Research Report

A Synthetic Supplemental Public-Use File of Low-Income Information Return Data: Methodology, Utility, and Privacy Implications

Abstract

The Statistics of Income division of the Internal Revenue Service releases an annual public-use file of individual income tax returns that is invaluable to tax analysts in government agencies, nonprofit research organizations, and the private sector. However, the Statistics of Income division has had to take increasingly aggressive measures to protect the data against growing disclosure risks, such as a data intruder matching the anonymized public data with other public information available in nontax databases. This project develops an alternative privacy protection method: a fully synthetic representation of the income tax data that is statistically representative of the original data. The method generates the synthetic data from a smoothed version of the empirical distribution of income tax returns. The resulting synthetic file includes no actual tax return records. In this report, we describe the methods used in the first part of this project, the creation of a synthetic public-use file of nonfilers. We show how the methodology protects the underlying data from disclosure and evaluates the quality of the data.

To reuse content from Urban Institute, visit copyright.com, search for the publications, choose from a list of licenses, and complete the transaction.