Understanding Synthetic Data: Using pseudo-records to maintain privacy in publicly released data

Madeline Pickens; Jennifer Andre; Gabriel Morrison

Fact Sheet

Understanding Synthetic Data

Subtitle

Using pseudo-records to maintain privacy in publicly released data

Madeline Pickens, Jennifer Andre,

Gabriel Morrison

January 31, 2023

Download fact sheet

(94.23 KB)

Add Urban on Google

Researchers, service providers, and other stakeholders can benefit from access to individual-level data safeguarded by governments or organizations. However, the public release of granular (disaggregated) data can violate the privacy of the people represented in that data. An alternative is to use synthetic data, which replace actual records in a confidential dataset with statistically representative pseudo-records, enabling data curators to release data that would otherwise be too sensitive for public release. This fact sheet provides an overview of use cases for synthetic data and the broad process for creating synthetic datasets, including definitions of applicable terminology. It also discusses how to evaluate the quality and privacy of synthetic output.

Expertise

Research Methods and Data Analysis