Differentially Private Methods for Validation Servers

Research Report

Differentially Private Methods for Validation Servers

A Feasibility Study on Administrative Tax Data

Abstract

Federal tax data, derived from individuals' and businesses' tax and information returns, are invaluable resources for research on a range of topics. That research improves our understanding of individuals' and firms' responses to economic incentives. However, full access to these data is available only to select government agencies, to a very limited number of researchers working in collaboration with analysts in those agencies, or through highly selective programs within the Internal Revenue Service Statistics of Income Division. In addition, the existing process of manually vetting each statistical release for disclosure risks is labor intensive and imperfect because it relies on subjective human review.

As part of larger project to implement an automated validation server, we conduct an extensive feasibility study on several differentially private methods for releasing tabular statistics, mean and quantile statistics, and regression analyses with cross-sectional data. We provide a discussion on which methods we tested and which methods could not be implemented in practice. We then evaluate the selected differentially private methods based on their impact on tax public policy decisions and several other utility metrics. From our findings, we outline the outstanding challenges and future work.

Research Area: 
To reuse content from Urban Institute, visit copyright.com, search for the publications, choose from a list of licenses, and complete the transaction.