The gold standard in making causal inference on program effects is a randomized trial. Most randomization designs in education randomize classrooms or schools rather than individual students. Such "clustered randomization" designs have one principal drawback: They tend to have limited statistical power or precision. This study aims to provide empirical information needed to design adequately powered studies that randomize schools using data from Florida and North Carolina. The authors assess how different covariates contribute to improving the statistical power of a randomization design and examine differences between math and reading tests; differences between test types (curriculum-referenced tests versus norm-referenced tests); and differences between elementary school and secondary school, to see if the test subject, test type, or grade level makes a large difference in the crucial design parameters. Finally they assess bias in 2-level models that ignore the clustering of students in classrooms.