User Guide Synthetic Data User Guide and Infographic
Madeline Pickens, Jennifer Andre, Gabriel Morrison
Display Date
Download user guide
(1.33 MB)
Fact sheets

Synthetic Data Infographic

The Department of Human Services (DHS) in Allegheny County, Pennsylvania, serves one in five residents of the county every year through child welfare services, behavioral health services, aging services, developmental support services, homeless and housing supports, and family strengthening and youth supports. In the process, data are collected about these services and the population using them. These data are integrated at the individual level to allow for better care coordination, operational improvements, and program evaluation. Because of the dataset’s sensitive nature, it cannot be widely shared at an individual level, so synthetic data are used in the real dataset’s place—allowing the data to be publicly shared and helping stakeholders, including researchers, service providers, and members of the public, understand these populations better.

This user guide and infographic accompanies a fully synthetic version of the 2021 Integrated Services dataset, which replaces the underlying records tracking the use of these services with statistically representative pseudo-records. Each record in the synthetic dataset represents a simulated individual, or record, who received at least one service from the Allegheny County DHS in 2021. The synthetic data were designed such that records aggregated by service represent the original data.

To create this synthetic data product, staff at the Urban Institute partnered with the Allegheny County DHS and the Western Pennsylvania Regional Data Center (WPRDC). The Urban Institute has a body of work dedicated to data privacy and has previously created synthetic datasets at the federal level. This partnership is intended to function as a pilot for synthetic data generation at the local level, to help understand the unique challenges that might face state and local governments in generating synthetic data. As this is the first synthetic data product released by Allegheny County and the WPRDC, the user guide is intended to provide an overview of the motivation behind the creation of a synthetic version of this dataset, a high-level summary of the data synthesis process, and information that will allow users to make informed decisions while using this dataset. The user guide is accompanied by an infographic explaining the key features of synthetic data, illustrating its usefulness in the Allegheny County context.

Policy Centers Office of Race and Equity Research
Research Methods Research methods and data analytics Data Governance and Privacy
Related content