Data Processing Is a Crucial and Misunderstood Stage in the 2020 Census
Data processing is currently in the spotlight with the US Census Bureau director’s recent announcement that routine “processing anomalies” will likely delay delivering state counts to the president well after the December 31 deadline. Experts have been predicting this for months (PDF), in part because the pandemic delayed many key aspects of the 2020 Census. This delay caused the data processing period to be cut short by 60 days and has raised concerns about quality.
Some errors are expected to be found, but the pandemic introduces uncertainty
Though “processing anomalies” sound dire, errors are to be expected in any decennial census. Counting the entire US population is a messy endeavor. People make mistakes filling out the census—sometimes predictably and other times not—resulting in questions being left blank or filled out inconsistently. A 5-year-old grandchild might incorrectly be listed as a parent-in-law because the wrong box is checked on the “relationship” question. Such errors would be flagged and resolved during the data processing period, often with protocols developed in advance.
Yet, the pandemic has made this anything but a typical census. The declaration of a COVID-19 national emergency happened simultaneously with the start of census fieldwork, delaying various elements of census operations. People relocating to new residences put the accuracy of the address-based enumeration of the census at risk. Though we have scant information from the Census Bureau about the nature of these processing anomalies, warning signs in recent months point to some clues.
The Census Bureau has been especially concerned about the enumeration of college students during the pandemic and produced materials (PDF) to clarify that those living in off-campus housing or dormitories should be counted there. The Census Bureau enlisted additional help from universities to improve the accuracy of the count, but had mixed success getting cooperation (PDF). As a result, some college students may be counted twice, and others may be missed from the count entirely.
The pandemic affected the count in other ways. Staffing and field operations became more logistically challenging, stymying efforts to reach hard-to-count populations. Scheduling delays meant enumeration was concurrent with wildfires in the West and hurricane season in the Gulf, adding to the uncertainty. These examples underscore how anomalous 2020 was for conducting the census.
Political pressures on this census further complicate the count
Data processing efforts at the Census Bureau are reportedly being further hampered by political pressures to hold to the December 31 deadline for delivering apportionment data to the president. There are reports that census data processing is happening around the clock and is staffed every day of the week. The latest estimate indicates data processing will be completed January 26 at the earliest—a date reported to the press indirectly via career staff and not through the director’s official statement. This has prompted heightened concern from the House of Representatives Committee on Oversight and Reform (PDF) about the schedule of the census and what they see as lack of transparency about the process from leadership. Given efforts to politicize the 2020 Census, including the administration’s past and ongoing Supreme Court challenges, it is unclear if there will be additional pressures placed on career staff.
The Census Bureau has the power to mitigate public concerns
Right now, a few small actions would instill confidence in the accuracy of the 2020 Census. First, the Census Bureau could make data quality indicators available to outside researchers (PDF) who can serve as an additional line of defense in monitoring efforts. External experts have a diverse set of expertise that career staff may not have and can improve quality assessment and better identify anomalies.
Second, career staff at the Census Bureau need as much time as they deem necessary to address data processing anomalies. If we look to the past to guide us, the 2010 Census had 10 million erroneous enumerations (PDF), of which 8.5 million were attributed to duplicates. If we layer on unpredictable residency related to the pandemic and the correspondingly high likelihood people will be counted twice, we can only assume that the magnitude of duplicates will be larger in 2020. Yet, in 2010, the processing period began in early July, allowing the Census Bureau a full six months to complete quality checks. In 2020, the data processing time was originally slated for 153 days, but pandemic-related delays cut this down to just 92 days (PDF). Given that data processing efforts are now delayed, the 92-day schedule was never feasible for the 2020 Census.
Third, greater transparency about “anomalies” in the data could be shared in a way that does not compromise the results and could enhance the public’s understanding of the Census Bureau’s work. This could be an opportunity to educate the public about the enormous effort required to produce a once-in-a-decade count of the US population and the valuable work the Census Bureau does. If the public understood how much an accurate count matters for their communities in the decade ahead, there might be outcry against its politicization and rush to deliver apportionment numbers. More time will allow Census Bureau staff to finish its work according to the agency’s high standards, and that benefits us all.
Reading, Pennsylvania, Mayor Eddie Moran talks to Tianna Vargas, holding her son Omillio Vargas, 6 months, about the importance of filling out her 2020 census. She has a family of 5, and had not yet filled out the forms, but after their conversation said that she would go do so online. (Photo by Ben Hasty / MediaNews Group / Reading Eagle via Getty Images)