by Duncan Chaplin

**What is it used to measure?**

Instrumental Variables (IV) methods are quasi-experimental methods used to estimate impacts of a treatment or policy variable when an experimental design is not feasible.

**How does it work?**

Estimating causal impacts is fraught with difficulty. Even randomized trials are imperfect, in part because we can seldom, if ever, conduct true experiments (though experimental design is still the gold standard of statistical research). IV is one of the more compelling quasi-experimental methods of estimating impacts, largely because the assumptions needed to justify the IV method are often more plausible than those needed to justify other methods, such as regression.

To estimate an IV model, identify some variable or variables (the *excluded instrumental variables*, or *excluded instruments*) that affect the key independent variable, but only impact the outcome through the key independent variables. For example, assume that month and state of birth would have no impact on earnings later in life, except that state laws determine when one can start and end his or her K–12 education (Joshua Angrist and Alan Krueger used this approach in their 1991 paper "Does Compulsory School Attendance Affect Schooling and Earnings?" in the *Quarterly Journal of Economics*).

IV models are often conceptualized as two separate equations: one specifies the relationship between the key independent variable and the outcome, the second specifies the relationship between the instrumental variables and the outcome. Thus,

1. *Y* = *X1*'*B1* + *K*'*B2* + *e1*

2. *K* = *X1*'*B3* + *IV*'*B4* + *e2*

where *Y* is the outcome; *K* is the key independent variable measuring the policy; *IV* is the instrumental variable; *X1* is a vector of other control variables; *B1*, *B2*, *B3*, and *B4* are parameters to be estimated; and *e1* and *e2* are error terms.

To justify using normal regression methods in the first equation to get an accurate estimate of *B2* (the policy impact), assume that *K* is exogenous. This assumption will often not hold if important variables are not included in the regression model (a problem called *omitted variable bias*), or if changes in the outcome variable Y cause changes in the independent variables (a problem called *simultaneity*). If the exogeneity assumption does not hold, then estimates of *B2* obtained using standard regression methods will be biased and inconsistent (i.e. the right answer will not be obtained on average, even if we get more data). However, by using *IV* and estimating both equations simultaneously, we can obtain a consistent estimate of *B2* (the impact of the policy *K*).

Suppose that we have a data set in which everyone must complete either 11 or 12 years of education, depending on the state they live in and when they were born. Let the *IV* variable be a dummy indicating whether or not one must complete 12 years of education instead of 11. Also assume that there are no control variables. Now the *IV* estimate of *B2* can be written as

*B2* = (*Y*12 - *Y*11)/(*E*12 - *E*11)

where *Y*= earnings, *E*=years of education, and the numbers (11 and 12) indicate values for people who had to complete 11 and 12 years of education respectively.

In general we would expect *E*12 - *E*11 to be much smaller than 1 because many people would have completed 12 years of education regardless of what state they lived in and some people would not complete 12 regardless of where they lived. However, at least some people would be impacted. Suppose, for example, that *E*12 - *E*11 = 0.1 meaning that having to attend an additional year increased years of education by 0.1. If we observe *Y*12 - *Y*11 = $100, this suggests that a 0.1 change in years of education is associated with a $100 increase in earnings. Extrapolating this to a full year of education suggests a value of $1,000 per additional year.

IV methods can also be used to control for measurement error. Suppose, for example, that we do not observe *K* directly, but instead we observe *K** = *K* + *u* where *u* is the measurement error. Once again, the standard methods of estimating *B2* will produce biased results. However, *IV* estimates will be consistent (assuming all assumptions are satisfied).

**Research examples**

"High School Employment: Meaningful Connections for At-Risk Youth"

"What 'Extras' Do We Get with Extracurriculars? Technical Research Considerations"