Measuring the impact of spending on social programs is critical to making sure we get the most bang for our scarce government bucks. To discern impact, it is important to use evaluations that are as strong as possible—or risk making policy decisions based on faulty information.
Back in the 1990s, a basic evaluation of the federal literacy program Even Start found that participants made demonstrable improvements in school readiness and literacy measures. The evaluators deemed the program effective. However, a more rigorous evaluation of the program conducted later using a randomized controlled trial (RCT) design found no net difference between the educational outcomes of Even Start participants in the treatment and control groups.
Without the RCT, policymakers may have assumed that Even Start was really improving literacy for the families in the program. After the RCT, policymakers understood that the true impact of the program may have been negligible, and the program was eventually eliminated.
Even 20 years later, Even Start is an important example for selecting a rigorous methodology to measure program success. And as more governments explore alternative financing mechanisms like pay for success (PFS), the importance of high-quality evaluation grows even stronger.
RCTs are often considered the gold standard for evaluating a program’s impact. In an RCT, program participants are randomly assigned to either the treatment or control group by independent evaluators. Proper randomization ensures that the only significant difference between the two groups is the program services. As such, RCTs are poised to answer whether program services cause better outcomes for the treatment group compared with the control group.
What are the benefits to incorporating RCTs into PFS projects?
While an RCT is not always the most appropriate evaluation method, it is particularly well suited to the structure of a PFS project. The PFS model shifts risk for funding social programs from a traditional source—usually government—to private investors. Rather than the widely used “fee-for-service” model, the government is only liable for payments if the program achieves its intended outcomes. Given the stakes, PFS stakeholders must consider a fundamental question: How can we best measure outcomes of a PFS project?
To this end, RCTs offer the following:
- Confidence in the results. RCTs randomly assign people to two conditions, those who receive the service and those who do not. It avoids the risk of results being biased by background variables, such as demographic differences (e.g., age, race, income). This minimizes the risk of statistical biases and increases confidence in a causal link between program services and outcomes.
- Clear results. The structure of RCTs allows for simple analytical techniques to calculate program impact.
- Fair service delivery. Demand for programs often outstrips the resources of even the most well-established programs. This means that only a fraction of the eligible population will receive services. Through randomization, RCTs distribute services fairly among eligible participants.
- Value for cost. All evaluations have up-front costs, such as those associated with designing the evaluation, collaborating with stakeholders, and analyzing collected data. Elevated confidence and clarity in the results of an RCT evaluation allow it to provide high value for cost.
- Evidence-building capabilities. RCTs can expand the evidence base of an intervention. The rigorous design of RCTs enables causal conclusions, which adds to knowledge of the contexts in which the program works.
Of the 11 PFS projects in the United States, six have incorporated an RCT, and one uses a RCT companion study. All evaluations should be conducted by external, independent evaluators to minimize the risk of introducing bias into the evaluation process.
What are potential challenges of using an RCT?
RCTs can come with challenges. First, evaluators conducting an RCT must gain the buy-in of local stakeholders, such as service providers and community leaders, by addressing concerns, potential challenges, and procedural questions with the community.
Second, projects implemented in rural areas or those that address a niche issue may have too few participants to reap the benefits of an RCT. A small sample size reduces the ability of the evaluation to detect statistically significant program effects, which decreases confidence in the validity of the RCT’s conclusions.
But among the many types of evaluations out there, RCTs are often the best option to achieve the objectivity and rigor required of PFS projects.
Tune in and subscribe today.
The Urban Institute podcast, Evidence in Action, inspires changemakers to lead with evidence and act with equity. Co-hosted by Urban President Sarah Rosen Wartell and Executive Vice President Kimberlyn Leary, every episode features in-depth discussions with experts and leaders on topics ranging from how to advance equity, to designing innovative solutions that achieve community impact, to what it means to practice evidence-based leadership.