Fact Sheet Understanding Synthetic Data
Using pseudo-records to maintain privacy in publicly released data
Madeline Pickens, Jennifer Andre, Gabriel Morrison
Display Date
Download fact sheet
(94.23 KB)

Researchers, service providers, and other stakeholders can benefit from access to individual-level data safeguarded by governments or organizations. However, the public release of granular (disaggregated) data can violate the privacy of the people represented in that data. An alternative is to use synthetic data, which replace actual records in a confidential dataset with statistically representative pseudo-records, enabling data curators to release data that would otherwise be too sensitive for public release. This fact sheet provides an overview of use cases for synthetic data and the broad process for creating synthetic datasets, including definitions of applicable terminology. It also discusses how to evaluate the quality and privacy of synthetic output.

Policy Centers Office of Race and Equity Research
Research Methods Research methods and data analytics Data Governance and Privacy