Research shows landlords overwhelmingly use tenant screening reports to make leasing decisions that affect housing access. Yet, the proprietary nature of these datasets leads to a lack of transparency in the accuracy and quality of the data provided. This brief opens the “black box” of tenant screening reports by exploring two components of tenant screening databases—eviction filings and criminal records—to highlight the challenges of accurately matching records without unique identifiers. We analyze the limits of data matching and provide avenues for improved practices to benefit renters and landlords. We find that the overall risk profiles within a tenant screening dataset are highly sensitive to subjective decisions about which cases belong to which individual, especially for those that contain eviction filings.
WHY THIS MATTERS
Research suggests tenant screening reports can lead to repeated denials in the private market or even homelessness for people with any negative marks on their records, notably in tight rental markets. Despite the tremendous impact these reports have on applicants, companies continue to use court records that do not have unique identifiers. These records are often low-quality, inaccurate, or misleading, resulting in erroneous and inaccurate tenant screening reports. This brief aims to develop evidence-based guidance to ensure that tenant screening reports include the highest quality information.
WHAT WE FOUND
- Between 1.26 and 1.61 million individuals were identified in the dataset we examined—with 40–52 percent having eviction histories and 56–69 percent having criminal histories—accounting for as much as 10–12 percent of the entire Pennsylvania population.
- We found many examples of individuals who could be matched based solely on name, even if their zip code and date of birth differed (i.e., false positives); examples where small differences in first or middle names could lead to a rejection (i.e., false negatives); and still more examples that were ambiguous because of insufficient data. Some of these individuals could have dozens of cases wrongly attributed to them under our most lenient matching assumptions.
- Both data quality and the way we assigned records to unique individuals within each dataset had significant effects on tenant screening data variability. Particularly for landlord-tenant records that often lack key information needed for matching, the number of people with an eviction filing or disposition varied by hundreds of thousands, depending on our assumptions.
- People of color were overrepresented in criminal and landlord-tenant records, compared with the state population. However, we did not find disparate impact in how data assumptions might additionally harm these groups. We suspect that limitations in how race and ethnicity are reported may be the cause of this inconsistency.
- The average number of years since a previous criminal offense ranged from 4.5 to 4.6 years, regardless of data assumptions made. This illustrates how sensitive data may be to renter protection policies that consider a five-year threshold for record sealing.
HOW WE DID IT
We used case- and docket-level data on criminal and landlord-tenant case records for the period of 2014–24 from the Administrative Office of Pennsylvania Courts. We cleaned, de-duplicated, and matched records using a range of probabilistic matching techniques on name, date of birth, and geography. We created 24 data scenarios that varied based on how we assigned records to a unique individual within each dataset, the matching algorithm we used to link individuals across datasets, and the match similarity threshold. Finally, we composed a full criminal and eviction history for each matched individual based on charges in the past seven years, severity and frequency of charges, and their disposition.