Researchers, service providers, and other stakeholders can benefit from access to individual-level data safeguarded by governments or organizations. However, the public release of granular (disaggregated) data can violate the privacy of the people represented in that data. An alternative is to use synthetic data, which replace actual records in a confidential dataset with statistically representative pseudo-records, enabling data curators to release data that would otherwise be too sensitive for public release. This fact sheet provides an overview of use cases for synthetic data and the broad process for creating synthetic datasets, including definitions of applicable terminology. It also discusses how to evaluate the quality and privacy of synthetic output.
Urban is a leader in providing evidence and solutions on the issues that affect the well-being of people and communities. Explore our insights.