Researchers, service providers, and other stakeholders can benefit from access to individual-level data safeguarded by governments or organizations. However, the public release of granular (disaggregated) data can violate the privacy of the people represented in that data. An alternative is to use synthetic data, which replace actual records in a confidential dataset with statistically representative pseudo-records, enabling data curators to release data that would otherwise be too sensitive for public release. This fact sheet provides an overview of use cases for synthetic data and the broad process for creating synthetic datasets, including definitions of applicable terminology. It also discusses how to evaluate the quality and privacy of synthetic output.
Subtitle
Using pseudo-records to maintain privacy in publicly released data
Display Date
File
File
(94.23 KB)