Inference

To say whether an observed pattern in data reflects something important about the real world or is merely chance variation, we rely on statistical inference. Statistical inference is primarily used to draw conclusions about some parameter from data. Quantitative analysis sets up estimation so that the statistical inference conclusions are useful in the real world, which requires ensuring that the parameter measures a useful concept and that the estimate variations are well-estimated.

The variability of estimates is captured by a standard error, the standard deviation of estimates, and good inference requires a good estimate of that standard error. If the standard errors are too small, we are likely to reject the hypothesis that there is no impact, leading us to find impacts where there are none. If standard errors are too large, we tend to fail to reject the hypothesis that there is no impact, leading us to conclude there are no impacts where there are real impacts.

Often, correlations in individual-level errors are the source of problems in estimating standard errors. For example, if a survey samples neighborhood blocks and interviews everyone on the block, it is plausible that all values in one block are abnormally high compared with the conditional mean, and abnormally low in another block, implying positive correlation of errors. This is a cluster sample and, because we do not know exactly how errors are correlated within a cluster, we want a standard error estimator that is robust to arbitrary within-cluster correlations: the cluster-robust standard error (Rogers 1993).

A similar clustering of errors can also exist in a simple random sample, if multiple sampled individuals are in larger units with intrinsically correlated errors, such as students within schools, members of a family, or multiple observations of individuals over time. The cluster-robust standard error works for each of these cases, as long as there are many clusters, the clusters are of comparable size, and errors are not correlated across clusters (Nichols and Schaffer 2007).

 


References

Rogers, Williams H. 1993. “sg17: Regression Standard Errors in Clustered Samples.” Stata Technical Bulletin 13: 19–23. [www.stata.com/support/faqs/stat/stb13_rogers.pdf]

Nichols, Austin, and Mark Schaffer. 2007. “Clustered Errors in Stata.” Presented at UK Stata Users’ Group, September 10. [www.stata.com/meeting/13uk/nichols_crse.pdf]