In the wake of the recent deaths at police hands of Michael Brown, Kajieme Powell, and so many others, people have rightly called for a thorough empirical analysis of how often and under what circumstances the police shoot civilians.
Unfortunately, to our knowledge, the data don't exist for that analysis. This is likely not the result of some grand police conspiracy; the problem is we don’t know much about most non-fatal shootings. It’s extremely time-consuming for police to record the facts of every incident, and police departments simply lack those resources.
We therefore applaud Deadspin.com's effort to crowdsource the assembly of an incident-level dataset of police shootings of civilians. Having those data is important for public policy and, in our view, for addressing the important social questions related to these events and dominating headlines.
But to be useful, the data must be valid and reliable. As social scientists, we spend a lot of time thinking about this in our own research. Valid police shooting data must measure exactly what we want them to measure. That is, they must report all incidents of police officers shooting civilians and only incidents of officers shooting civilians. And the data must be reliable, meaning that someone could use the same data collection process again to produce the exact same dataset.
So, what can Deadspin do to ensure validity and reliability?
1) Ensure that the data are unbiased.
Data collectors and the Deadspin quality controllers should cull reported shooting incidents from every valid news source and not just media outlets from major cities. Clear reports of police shootings in the The New York Times or Chicago Tribune count, but so do reports in the Northern Wyoming Daily News, which might be the only source on a police-shooting incident in Washakie County, Wyoming (population 8,400). As long as the details of that incident are clear and not in dispute, it should be counted.
2) Set rules for judging reliability and validity.
An incident is valid for this dataset if one or more such news sources unambiguously reports it as an officer shooting, and no other reputable source contradicts that report. When the details are unclear or in dispute, the case should be included in the dataset but flagged as having an ambiguous status*.
The data are reliable if the collection process is reliable. Each collector must receive the same set of precise instructions on how to collect data and how to troubleshoot unclear news reports. Collectors should also have a forum for reporting questions or problems, and Deadspin staff or social scientists should document all decisions and judgment calls for future reference.
3) Quality check the data.
The beauty of this project is that it relies on cheap and abundant labor. Deadspin should take advantage of that again after the data are fully assembled. They should randomly select a sample of the days for which incidents were gathered—say, 10 or 15 percent of all days—and crowdsource them to different data collectors. If a second crowd can use the exact same process and generate the exact same results as the original collectors, we can feel comfortable that the full dataset is (close to) valid.
At the end, the whole process should be written up in clear and non-scientific terms, including documentation of questions, troubleshooting, and judgment calls. Deadspin should then invite social scientists to review that process. It's possible that staff will have to go back and make some improvements or adjustments, which is a frustrating but necessary part of every data collection process.
4) Gather as much data as possible.
Once the data exist, people will naturally want to use them to answer big questions: Was the killing justified? Was it racially motivated? Did the officer act out of line or simply make a tough judgment call? Were drugs involved? Were lives at stake?
Quality news reports will provide a lot of the context we want, and collectors should take care to gather as many of the facts as systematically as possible. How many officers were present? How many discharged their weapons? What were their races, genders, and ages? What was the race, gender, and age of the victim? Was the incident outside on the street or in a house? Did the police suspect drugs were present? Did the civilian demonstrate a threat of force? What other details were unique to that case, but are still relevant?
Deadspin's effort is innovative and exciting. With care and diligence, this dataset could help us answer some tough questions about tragic events.
*This post has been updated. It originally suggested leaving out cases that are unclear or whose facts are in dispute. Better to include those cases but clearly note their unambiguous status.
Photo: Police in Ferguson, MO. (AP Photo/Jeff Roberson)