Urban Wire Alpha Testing and Randomized Control Trials: Improving on the Gold Standard
Gregory B. Mills
In the world of policy research, experimental evaluations—or randomized control trials (RCTs)—are the gold standard.  Indeed, they are the most rigorous way to estimate a program’s effects on participants. That’s because the results for the participants—the “treatment group”—are measured against a randomly picked “control group” that doesn’t enter the program.

Increasingly, RCTs are being used to find out if programs providing financial services to low-income people work. One recent large-scale example—evaluated by the Urban Institute—is a Treasury Department study of the effects of offering (through the Green Dot Corporation) prepaid debit cards to low-income tax filers so they can direct-deposit their federal tax refunds. In the 2011 tax filing season, 950,000 filers nationwide were randomly assigned to either a control group or one of eight other groups that received differing card offers.

A key element of any experimental study is the take-up rate of whatever is being offered to the treatment group. The closer the treatment group’s take-up rate to program expectations for an eventual operational roll-out, the more reliable the study’s assessment of program impacts.

Behavioral economics has taught us that our financial decision-making is deeply influenced by subtle contextual factors that frame our choices. Given the importance of this “choice architecture” and the pivotal role of the treatment group’s take-up, it’s surprising that so few program offers are pre-tested before they are evaluated. Ironically, government routinely requires pre-testing of the program evaluation questionnaires even though the program’s offer is rarely vetted in advance.

Social science should take a lesson from the computer industry. It uses “alpha tests”—small-scale short-term acceptance testing in an operational setting. Such testing goes beyond customer focus groups (which is what the Treasury study used), and a few funders are trying it. The StabilityFirst pilot test, conducted in 2010 by Harvard’s “ideas42” center on applied behavioral economics, enrolled 20 students at Central New Mexico Community College in Albuquerque into a prepaid debit card program. The students were interviewed at length both before and after to gauge their reactions to the program. A range of issues surfaced, including difficulty resolving customer service matters. Participants were reluctant to make calls to the customer service line, not wanting to commit scarce cellphone minutes for a possibly lengthy call with time spent being transferred or on hold.

Alpha tests like the one in Albuquerque can help researchers identify design features that inhibit take-up. And once these “blocking factors” are known, they can be corrected before an experimental evaluation is launched, making randomized trials far more useful and the outcomes sought more likely to come about.

