Strong evidence on which social programs work is crucial for effective policymaking, as it ensures that limited government resources go to programs that deliver results. Impact evaluations, which use randomized controlled trials, or quasi-experimental designs, contribute to this evidence base. But what if an untested social policy or program appears so intuitive and has a theory of change so compelling that government officials are highly confident in its ability to deliver the promised outcomes? Is there still a benefit in doing a rigorous evaluation?
The answer is yes. Many social policies and programs were assumed to be effective but were eventually proven not to be. Impact evaluations show us which programs to invest in and which programs not to invest in.
Analyses reveal a shocking finding
The Cambridge-Somerville Youth Study is one example of this unexpected result. Initiated in the late 1930s, this longitudinal study has been recognized as the first randomized evaluation of a social program. The program it studied—mentoring for at-risk boys and young men from low-income backgrounds—appeared to have obvious and intuitive merit. Mentees remembered their experiences fondly, and many believed the program improved their lives.
But repeated analyses, comparing outcomes for those who were mentored with those who were not, published in 1948, 1959, and 1978, suggested the program was not effective. In fact, the 1978 study found that on several measures—including alcoholism, multiple criminal offenses, and stress-related diseases—the treatment group fared worse than the control group. Compounding these results, the study found that the longer someone was in the intervention program, the worse the outcomes they experienced. It was stunning but compelling evidence that this social program, which had appeared promising, was actually detrimental.
More recent evaluation examples also reveal surprising and counterintuitive results. For example, rigorous policy research on the effect of school vouchers in Louisiana and Indiana suggests that students receiving such vouchers actually underperformed their peers in public school.
What does this mean for existing programs?
These stories underscore the importance of evaluating programs that are new or are being applied in new ways. But what if a program already has a strong evidence base? Is another impact evaluation necessary?
It depends. A recent article in the Stanford Social Innovation Review (SSIR) suggests governments and researchers are not always required to conduct evaluations of programs in each local context. The authors assert that evaluations can uncover “general behaviors that are found across settings and time” that can inform policies in multiple contexts. If a lot of evidence already exists on a policy, it may be more advantageous to learn about the mechanisms behind the program or policy—as the article put it, “why people responded the way they did”—and then determine whether these mechanisms would occur in a new setting.
A process evaluation, which details how the intervention was implemented in practice, could help identify such information. A process evaluation of a successful mentoring program might find that the key ingredients of the program’s success were the social incentives tied to peer accountability. Once there is reliable evidence of a social program’s efficacy, generalizability may not require additional evaluations. We can draw upon an evidence base that already exists to make informed decisions about what to fund and not fund.
But there are caveats. To determine whether mechanisms of a social program will match with the local behaviors of a different context, researchers must identify the mechanism behind a program’s success. Impact evaluations help us learn whether the program as a whole was effective, but not necessarily what elements or mechanisms within the program were most critical to its success. A process evaluation can provide valuable insight but is not always conducted or shared publicly. There also must be knowledge of local behaviors and a good basis for estimating whether the mechanism will match. This process requires a significant degree of judgment. Ideally, the new context would be similar to the one where the program was first evaluated. But, as the SSIR authors point out, “similar” can mean a lot of things, and differences that may be critical to outcomes can be hard to predict.
Although rigorous evaluations can give researchers a high degree of confidence in their results, they should not close the book on any program or policy’s evidence-building. Future studies could confirm the findings, find different degrees of benefits or detriment to participants, or contradict the evidence of impact. The challenge of replicability—or trying to reproduce a study to yield the same outcomes—is a common concern and discourages treating any single study as conclusive.
The Cambridge-Somerville Youth Study suggested the program was a bad investment, but it could have just as easily suggested the opposite. Once an impact evaluation provides reliable evidence of a program’s effectiveness, researchers can consider how that evidence can be interpreted for a different context—location, population, or time—without having to start the evidence-building process from scratch.