Inference

Project

Quantitative Data Analysis

Project Navigation

Project Home

Microsimulation

The Analysis of Transfers, Taxes, and Income Security (ATTIS) microsimulation model

The Medicare Policy Microsimulation Model (MCARE-SIM)

The Model of Income in the Near Term (MINT)

The Tax Policy Center Microsimulation Model

The Dynamic Simulation of Income Model (DYNASIM)

The Health Insurance Policy Simulation Model (HIPSM)

The Transfer Income Model (TRIM)

Descriptive Data Analysis

Inference

Impact Analysis

Bias

Experiments

Paired Testing

Quasi-experimental Methods

Difference-in-Difference and Panel Methods

Instrumental Variables

Propensity Score Matching

Regression Discontinuity

Regression Techniques

Generalized Linear Model

Linear Regression

Logit and Probit Regression

Segregation Measures

Inequality Measures

Decomposition Methods

Performance Measurement and Management

Inference

To say whether an observed pattern in data reflects something important about the real world or is merely chance variation, we rely on statistical inference. Statistical inference is primarily used to draw conclusions about some parameter from data. Quantitative analysis sets up estimation so that the statistical inference conclusions are useful in the real world, which requires ensuring that the parameter measures a useful concept and that the estimate variations are well-estimated.

The variability of estimates is captured by a standard error, the standard deviation of estimates, and good inference requires a good estimate of that standard error. If the standard errors are too small, we are likely to reject the hypothesis that there is no impact, leading us to find impacts where there are none. If standard errors are too large, we tend to fail to reject the hypothesis that there is no impact, leading us to conclude there are no impacts where there are real impacts.

Often, correlations in individual-level errors are the source of problems in estimating standard errors. For example, if a survey samples neighborhood blocks and interviews everyone on the block, it is plausible that all values in one block are abnormally high compared with the conditional mean, and abnormally low in another block, implying positive correlation of errors. This is a cluster sample and, because we do not know exactly how errors are correlated within a cluster, we want a standard error estimator that is robust to arbitrary within-cluster correlations: the cluster-robust standard error (Rogers 1993).

A similar clustering of errors can also exist in a simple random sample, if multiple sampled individuals are in larger units with intrinsically correlated errors, such as students within schools, members of a family, or multiple observations of individuals over time. The cluster-robust standard error works for each of these cases, as long as there are many clusters, the clusters are of comparable size, and errors are not correlated across clusters (Nichols and Schaffer 2007).

References

Rogers, Williams H. 1993. “sg17: Regression Standard Errors in Clustered Samples.” Stata Technical Bulletin 13: 19–23. [www.stata.com/support/faqs/stat/stb13_rogers.pdf]

Nichols, Austin, and Mark Schaffer. 2007. “Clustered Errors in Stata.” Presented at UK Stata Users’ Group, September 10. [www.stata.com/meeting/13uk/nichols_crse.pdf]