Quantitative Data Collection

Urban Institute researchers take advantage of dozens of existing quantitative data sets to study the world. These data come from the many federal, state, and local government agencies, as well as from dozens of private and proprietary sources.

But for many research questions, the proper data do not exist, and Urban researchers must undertake their own data collection. Though surveys are a primary method for generating new data, Urban researchers have also employed a range of creative data-generating methods.

Surveys

The Urban Institute has extensive experience designing and administering rigorous, reliable sample surveys, using a wide range of methodologies. Surveys are typically designed by Urban experts and, depending on their complexity and size, are executed by either Urban staff or subcontractors.

Urban researchers develop and test survey instruments to capture the data needed for effectively addressing a study’s research questions. Urban surveys are often multimode ventures, collecting data face to face, over the phone, through the mail, online, or any combination of the above. Each method carries its own benefits and drawbacks in accuracy, cost, and respondent burden. For example, face-to-face surveys on nonsensitive topics tend to yield the highest response rates and accuracy. But they are also the most expensive to administer, involving substantial labor to secure participation and schedule an interview, as well as travel time to the interview location. In contrast, telephone surveys are generally less costly that face-to-face surveys but tend to have lower response rates and still require significant field staff labor. Mail and web surveys are the least expensive to administer but rely on respondents’ literacy aptitude and diligence to complete. For many applications, a mixture of data-collection methods tends to provide the best balance between accuracy and cost.

Survey design and sampling issues

Effective surveys must consider several sampling design issues. Population coverage refers to the share of the target population covered by the survey. Unit nonresponse is the share of desired survey responders that does not respond; well-designed samples will take steps to assess whether nonresponders differ from responders, thus introducing bias into the results. Some surveys intentionally oversample a specific subpopulation for separate or expanded analyses. Screening of subjects is sometimes required to identify the members of the population that are the focus of a study, such as low-income families with children.

Two other sampling issues are frames and complexity. Sampling frames are lists or files of population members from which a sample can be selected. Such lists can help reduce sampling costs, but they come with potential drawbacks such as false listings, duplicates, incomplete or false information, and population noncoverage (i.e., not including everyone in the population of interest). Another issue is sampling complexity, which can involve cluster sampling, multistage sampling, sampling with unequal probabilities, multiphase sampling, and oversampling certain groups. Such methods are sometimes necessary to answer the research question, but they can reduce statistical efficiency of the sample (relative to, say, a similarly sized random sample of the population). Such inefficiency needs to be estimated and incorporated into the development of power analyses, maximum detectable differences, confidence intervals, and margins of error.

Other data-gathering methods

Though well-designed surveys can produce rigorous and reliable data, they are often too costly or simply inappropriate. Urban researchers regularly make creative use of other techniques, including the examples listed below.

Google Street View

For a recent evaluation, Urban researchers needed baseline data on the built environment of dozens of neighborhoods across the country: the quality of the streets, sidewalks, intersections, and lighting; the presence of vacant and abandoned buildings; and access to public transportation.

To gather these data, researchers used theory and established protocols to develop a “block front” survey questionnaire, each item assessing one aspect of the quality of the block. To “question” the block fronts, researchers pulled random samples of street segments in each neighborhood. An observer plugged the latitude and longitude coordinates of each segment into Google Street View, then “walked” along a street segment, recording answers to each question in the survey guide.

Each street segment’s data were entered into a database, generating a novel empirical baseline from which to measure the effect of future interventions.

Fire modeling

Working with a dozen fire departments around the DC metropolitan area, Urban researchers worked with the National Institute of Standards and Technology, the International Association of Fire Fighters, and others to create real-life simulations of three-alarm high-rise fire responses. Firefighters practiced responding to a simulated fire, including water on fire and victim rescue. Over 20 tasks associated with the fire response were timed and measured using several configurations of firefighter team sizes to examine the victim, property, and firefighter safety risks associated with each team size. The data were fed into a Monte Carlo simulation of fire spread and toxicity in the building to establish victim survival.

Google Trends search data

When Urban researchers recently assessed data generated by Google searches for keywords like “buy a gun” and “buy gun,” they found that those searches highly correlate with other, official measures of gun purchase interest. While not groundbreaking, these preliminary correlations suggest that Google search data may provide low-cost, nearly real-time predictors of activities like gun purchases. These data could prove useful as it is nearly impossible to find comprehensive, official measures of actual gun sales.