Safely Expanding Research Access to Administrative Tax Data: Creating a Synthetic Public Use File and a Validation Server

Research Report

Safely Expanding Research Access to Administrative Tax Data: Creating a Synthetic Public Use File and a Validation Server

Abstract

Administrative tax data contain a wealth of information that is potentially valuable for research and analysis. However, the legal and ethical imperative to protect taxpayer privacy has restricted their access to a small number of government analysts and select researchers. We propose to develop, in consultation with the experts at the Statistics of Income Division of the Internal Revenue Service (IRS), a fully synthetic tax database—that is, a file that preserves many of the statistical characteristics of the restricted data without containing any identifiable tax return information. Working with the IRS, we also hope to develop a procedure for researchers to submit their statistical programs, which have been tested on the synthetic data, to run on IRS computers, subject to a review to guarantee that their output satisfies disclosure avoidance protocols. This paper discusses the current methodology used to produce public use datasets, surveys the literature on synthetic data and privacy protection, outlines our proposed plan to produce a synthetic file, and discusses special challenges.

Research Area: 
To reuse content from Urban Institute, visit copyright.com, search for the publications, choose from a list of licenses, and complete the transaction.