1 Introduction

This document is a short guide to implementing some of the methods discussed in Mogstad and Torgovitsky (“Instrumental Variables with Heterogeneous Treatment Effects,” 2024, Handbook of Labor Economics). It describes how to:

  • Conduct a RESET test of the null hypothesis that a linear IV specification is weakly causal.
  • Implement double/debiased machine learning (DDML) estimators that are ensured to be weakly causal.
  • Use an instrument propensity score weighting estimator to estimate an unconditional LATE.
  • Estimate and aggregate the marginal treatment effect (MTE) curve for a binary treatment.

The document contains code samples for both R and Stata. In each section, I’ll briefly describe the theoretical background in the context of the R code. Then I’ll go back and show the Stata equivalent.

Any questions or problems, feel free to post an issue.

Code for reproducing the results in the handbook chapter is available in the ivhandbookReplication repository.

2 Packages and data

2.1 R

I’m going to use some tidyverse tools in the following. Install them from CRAN with install.packages(c("tidyverse", "broom", "purrr")).

library("tidyverse")
library("broom")
library("purrr")

You’ll see where I load other packages in the code below. These can be installed in the same way.

2.2 Stata

You’ll need to install some modules from SSC. I’ll show you which ones when we get there.

2.3 Card data

The first three topics will be illustrated with the well-known Card (1993) (link) extract of the NLSY79. Card used distance to a four-year college (nearc4) as an instrument for educational attainment (educ) to estimate the effect of education on wages (lwage, measured in logs), also commonly described as the returns to education. The treatment is multivalued and the instrument is binary.

card <- haven::read_dta("card.dta")

3 Does linear IV estimate a causal effect?

3.1 Background

Conditioning on covariates is important for Card’s argument that distance to college is exogenous. For example, urban areas might be more likely to be close to a four-year college while also being relatively high-wage labor markets. Not conditioning on covariates leads to substantially different point estimates.

card <- card %>%
  mutate(
    Y = lwage,
    D = educ,
    Z = nearc4
  )
covs <- c(
  "south", "south66", "smsa", "smsa66",
  "reg661", "reg662", "reg663", "reg664", "reg665", "reg666", "reg668",
  "black", "exper", "expersq"
)
library("ivreg")
library("lmtest")
library("sandwich")

# This is Card Table 5, column (3)
covs_flist <- paste(covs, collapse = " + ")
ivreg(
  data = card,
  formula = as.formula(paste("Y ~", covs_flist, "| D | Z"))
) %>%
  coeftest(vcov = vcovHC, type = "HC1") %>%
  tidy() %>%
  filter(term == "D")
## # A tibble: 1 × 5
##   term  estimate std.error statistic p.value
##   <chr>    <dbl>     <dbl>     <dbl>   <dbl>
## 1 D        0.132    0.0541      2.43  0.0152
# Not controlling for any covariates leads to considerably larger estimates
ivreg(
  data = card,
  formula = "Y ~ D | Z"
) %>%
  coeftest(vcov = vcovHC, type = "HC1") %>%
  tidy() %>%
  filter(term == "D")
## # A tibble: 1 × 5
##   term  estimate std.error statistic  p.value
##   <chr>    <dbl>     <dbl>     <dbl>    <dbl>
## 1 D        0.188    0.0261      7.19 7.92e-13

If there were no covariates, and given the usual monotonicity and full exogeneity conditions satisfied, then Card’s estimate could be interpreted as estimating an average causal response (ACR), which is a generalization of the concept of a LATE from a binary treatment to a multivalued treatment. However, Blandhol et al. (2022) show that this interpretation generally breaks down when there are covariates. In particular, they show that unless the conditional mean of the instrument given the covariates is linear, the estimand cannot be interpreted as a “weakly causal” quantity that gives non-negative weight to all treatment effects. They describe the property that the conditional mean of the instrument is linear by saying that the linear IV specification has “rich covariates.”

Rich covariates can be tested using a test of functional form for linear regression. The most well-known is the Ramsey (1969) RESET test, which is easy to implement. The lmtest package contains a built-in routine called resettest, which can be used as follows.

library("lmtest")
library("car")
# Regress instrument Z on linear specification of covariates X
lr_zx <- lm(data = card, formula = as.formula(paste("Z ~ ", covs_flist)))
# Heteroskedasticity-robust RESET test
resettest(lr_zx, type = "fitted", vcov = function(x) vcovHC(x, type = "HC1"))
## 
##  RESET test
## 
## data:  lr_zx
## RESET = 9.8868, df1 = 2, df2 = 2995, p-value = 5.252e-05

3.2 RESET with multicollinearity in R

The resettest function doesn’t seem to be coded in a way that handles multicollinearity. For example, it throws an error about “aliased coefficients” if we include a superfluous region indicator when using a heteroskedasticity-robust variance-covariance matrix. (“Aliased coefficients” being another phrase for what economists would typically call perfect multicollinearity.)

covs_flist_mc <- paste(covs_flist, "+ reg667")
lr_zx <- lm(data = card, formula = as.formula(paste("Z ~ ", covs_flist_mc)))
lr_zx %>%
  tidy() %>%
  filter(is.na(estimate))
## # A tibble: 1 × 5
##   term   estimate std.error statistic p.value
##   <chr>     <dbl>     <dbl>     <dbl>   <dbl>
## 1 reg667       NA        NA        NA      NA
resettest(lr_zx, type = "fitted", vcov = function(x) vcovHC(x, type = "HC1"))
## Error in waldtest.lm(lm(y ~ 0 + X + Z), "Z", vcov = vcov, ...): there are aliased coefficients in the model

Having some multicollinearity is common in economic applications, which frequently include a variety of fixed effects as additional controls. Removing the collinear terms by hand can be tedious. So this drawback of resettest is a bit annoying. However, it’s also easy to implement a RESET test yourself as a standard F-test using tools that are coded defensively enough to handle multicollinearity:

card %>%
  mutate(
    yhat = lr_zx$fitted.values,
    yhat2 = yhat^2,
    yhat3 = yhat^3
  ) -> card
lrram <- lm(
  data = card,
  formula = as.formula(paste("Z ~ yhat2 + yhat3 + ", covs_flist_mc))
)
linearHypothesis(lrram, c("yhat2", "yhat3"), white.adjust = "hc1", singular.ok = TRUE)
## Linear hypothesis test
## 
## Hypothesis:
## yhat2 = 0
## yhat3 = 0
## 
## Model 1: restricted model
## Model 2: Z ~ yhat2 + yhat3 + south + south66 + smsa + smsa66 + reg661 + 
##     reg662 + reg663 + reg664 + reg665 + reg666 + reg668 + black + 
##     exper + expersq + reg667
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1   2995                        
## 2   2993  2 9.8868 5.252e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Notice that both the test statistic and p-value match what we had above using resettest in the specification without the extra collinear indicator. The inclusion of the option singular.ok = TRUE is necessary or else linearHypothesis will also complain about multicollinearity. (I don’t think there is a similar option that one can pass to resettest, which ultimately calls waldtest.lm.) Alternatively, it’s not that hard to do all of this by hand either.

yhats <- c("yhat2", "yhat3")
bhat <- lrram$coef[yhats]
vc <- vcovHC(lrram, type = "HC1")[yhats, yhats]
ts <- t(bhat) %*% solve(vc) %*% bhat # Wald statistic
pchisq(ts, 2, lower.tail = FALSE) # Matches p-value above
##             [,1]
## [1,] 5.08394e-05
ts / 2 # F-statistic, matches above
##          [,1]
## [1,] 9.886839

However you do it, the conclusion from the RESET test in this case is an overwhelming rejection of the null hypothesis. The null hypothesis is equivalent to the statement that the specification has rich covariates, which in turn is necessary (and in many cases sufficient) for the IV estimate to have a weakly causal interpretation. So rejecting the null here means that there is strong statistical evidence that Card’s estimate is not weakly causal; it is not a non-negatively weighted average of treatment effects.

3.3 Stata

A RESET test can be implemented in Stata with the postestimation command estat ovtest. The documentation on ovtest is pretty sparse, as are the options. As far as I can tell, the Stata routine cannot be made robust to heteroskedasticity, nor can the powers of the fitted values used in the test be changed. Here’s a demonstration of that, as well as code that allows you to implement the RESET test directly and get around these limitations.

## 
## . use card.dta, replace
## 
## . 
## . local Y lwage
## 
## . local D educ
## 
## . local Z nearc4
## 
## . local X south smsa smsa66 ///
## >   reg661 reg662 reg663 reg664 reg665 reg666 reg667 reg668 reg669 ///
## >   black exper expersq
## 
## . 
## . // Use Stata's built-in
## . qui reg `Z' `X'
## 
## . estat ovtest
## 
## Ramsey RESET test for omitted variables
## Omitted: Powers of fitted values of nearc4
## 
## H0: Model has no omitted variables
## 
## F(3, 2992) =   6.97
##   Prob > F = 0.0001
## 
## . display "`r(F)'"
## 6.971449756006162
## 
## . 
## . // One might expect this to be different, but it's not
## . qui reg `Z' `X', vce(robust)
## 
## . estat ovtest
## 
## Ramsey RESET test for omitted variables
## Omitted: Powers of fitted values of nearc4
## 
## H0: Model has no omitted variables
## 
## F(3, 2992) =   6.97
##   Prob > F = 0.0001
## 
## . display "`r(F)'" // Same, even though we used robust standard errors
## 6.971449756006162
## 
## . 
## . // So to account for heteroskedasticity we can implement the test this way
## . qui reg `Z' `X', vce(robust)
## 
## . predict yhat
## (option xb assumed; fitted values)
## 
## . gen yhat2 = yhat^2
## 
## . gen yhat3 = yhat^3
## 
## . qui reg `Z' `X' yhat2 yhat3, vce(robust)
## 
## . test yhat2 yhat3 // This matches the results from R
## 
##  ( 1)  yhat2 = 0
##  ( 2)  yhat3 = 0
## 
##        F(  2,  2993) =    9.89
##             Prob > F =    0.0001
## 
## . 
## . // Here's what Stata is doing by default
## . gen yhat4 = yhat^4
## 
## . qui reg `Z' `X' yhat2 yhat3 yhat4
## 
## . test yhat2 yhat3 yhat4
## 
##  ( 1)  yhat2 = 0
##  ( 2)  yhat3 = 0
##  ( 3)  yhat4 = 0
## 
##        F(  3,  2992) =    6.97
##             Prob > F =    0.0001
## 
## . return list // Matches the homoskedastic test
## 
## scalars:
##                r(drop) =  0
##                r(df_r) =  2992
##                   r(F) =  6.971424720166404
##                  r(df) =  3
##                   r(p) =  .0001133744330715

4 Double/debiased machine learning

4.1 Motivation

Using a nonparametric functional form for covariates is one way to address the problem of not having rich covariates. A low-tech nonparametric approach is to construct a saturated regression, in which there is one bin for each covariate value. These are the types of specifications that are assumed in Angrist and Imbens (1995) and Angrist and Pischke (2009) to justify linear IV estimates as representing something that is weakly causal. But using such a specification in Card’s application—as in many/most others—is not going to be feasible:

card %>%
  select(all_of(covs)) %>%
  summarize(across(everything(), ~ n_distinct(.))) %>%
  as.matrix() %>%
  prod()
## [1] 2359296
nrow(card)
## [1] 3010

A saturated regression in Card’s case would have over 2.3 million regressors for only about 3,000 observations!

More sophisticated nonparametric methods can get around this limitation by engaging in some data-driven model selection. A particularly attractive approach for the current problem are the double/debiased machine learning (DDML) estimators discussed by Chernozhukov et al. (2018). These estimators make use of two techniques to estimate a version of the linear IV model that controls for covariates in an unknown way. The first is a reformulation of the implicit moment condition in linear IV into a new “orthogonalized” moment condition that incorporates the idea from the “doubly-robust” literature that the product of two small quantities is an even smaller quantity. This allows for nonparametric estimators that are more flexible, and hence converge at slower rates, because the product of two slowly-converging quantities can converge at the usual parametric rate. The second element of the DDML estimator is cross-fitting, which is used to defend against overfitting similar to the way that sample splitting would, but without the attendant loss of data.

In the following, I’ll demonstrate the use of the ddml package for R Ahrens et al. (2024b). An alternative is the DoubleML package Bach et al. (2024). The attractions of ddml are that the learning curve is a bit flatter, there is a closely-related Stata module, which I’ll demonstrate below, and that it allows for a technique called short-stacking (Ahrens et al. 2024a). Stacking is a technique that combines an ensemble of different ML algorithms together and then chooses a convex-weighted average of the result based on which ones perform the best in out-of-sample prediction. Short-stacking is a computationally attractive modification, so called because it takes a computational short-cut when determining the weights.

I mention these theoretical details to motivate three aspects in the implementation:

  1. The ML algorithms that will be used are flexible ones like random forests, gradient-boosted trees, or neural networks, all of which can reasonably be viewed as nonparametric. This is why the orthogonalization and cross-fitting implemented by DDML are important.
  2. Cross-fitting introduces an additional layer of randomness because the folds are chosen randomly. This will be addressed by running the procedure several times and accounting for the variation across runs when computing standard errors.
  3. Short-stacking means we can combine multiple ML algorithms together. As Ahrens et al. (2023) demonstrate, individual ML algorithms can perform poorly. Combining multiple ML algorithms together through stacking limits the risk that a poor algorithm exerts much influence on the final estimate.

4.2 R

The main step in using DDML is defining the ML algorithms. This is done with a list of lists. Each list in the list of lists has two components: (i) the name of the function used to call the algorithm and (ii) the additional arguments passed to the function. For the former, ddml includes wrappers to ranger and xgboost, which are popular packages for random forests and gradient-boosted trees, respectively. One can also define additional wrappers; see this bit of code in the replication repository for an example using neural networks from the nnet package. The second argument to each list is a set of tuning parameters. It can be difficult to know how to set these a priori, so it’s reasonable to view the same algorithm with different tuning parameters as a separate ML algorithm.

Here’s an example with six algorithms consisting of three random forests and three gradient-boosted trees:

library("ddml")

learners <- list(
  list(
    fun = ddml::mdl_xgboost,
    args = list(max_depth = 2)
  ),
  list(
    fun = ddml::mdl_xgboost,
    args = list(eta = .01)
  ),
  list(
    fun = ddml::mdl_xgboost,
    args = list(eta = .05, max_depth = 3)
  ),
  list(
    fun = ddml::mdl_ranger,
    args = list()
  ),
  list(
    fun = ddml::mdl_ranger,
    args = list(max.depth = 4)
  ),
  list(
    fun = ddml::mdl_ranger,
    args = list(max.depth = 3, mtry = 5)
  )
)

How did I choose these tuning parameters? Almost completely at random! I have a sense of which ones are thought to be important based on my pedestrian understanding of how the algorithms work (at the level of say Hastie, Tibshirani, and Friedman (2009)). Then I just tried a few values different than the default. This is one of the drawbacks of using an ML approach: even though model selection is completely data-driven, the analyst still has a large number of choices to make in how the model selection is implemented and the computational cost of exploring them thoroughly is high. Short-stacking will hopefully help save us from making bad choices.

Now that we have these learners defined, we can call ddml with short-stacking. The specific DDML estimator we want in for this case is ddml_pliv for “partially linear IV.”

Y <- as.matrix(card$Y)
D <- as.matrix(card$D)
Z <- as.matrix(card$Z)
X <- as.matrix(card[, covs])

# I'll define a list of arguments so I can reuse it below
args <- list(
  y = Y, D = D, Z = Z, X = X,
  learners = learners,
  shortstack = TRUE,
  ensemble_type = "nnls1", # Use non-negative least squares to produce weights
  sample_folds = 5,
  silent = TRUE,
  custom_ensemble_weights = diag(length(learners)) # Useful for diagnostics
)
set.seed(52)
result <- do.call(ddml_pliv, args)

The result object contains quite a bit of information useful for diagnostics. If you just want the point estimate and standard error, they can be obtained like this:

s <- summary(result)
s["D_r", "Estimate", "nnls1"] # point estimate
## [1] 0.1364128
s["D_r", "Std. Error", "nnls1"] # standard error
## [1] 0.05371138

Because we set custom_ensemble_weights equal to an identity matrix, we can also recover the point estimates and standard errors associated with each individual learner:

t(s["D_r", c("Estimate", "Std. Error"), ])
##           Estimate Std. Error
## nnls1    0.1364128 0.05371138
## custom_1 0.1432309 0.05352455
## custom_2 0.1414772 0.05545172
## custom_3 0.1368551 0.05367814
## custom_4 0.1408755 0.05447024
## custom_5 0.1292378 0.05546726
## custom_6 0.1217142 0.05125895

Not all of the learners get weighted equally in the nnls1 estimate. You can get the weights that go into each function that is estimated nonparametrically by looking at result$weights:

result$weights %>%
  map_dfr(~ .x[, "nnls1"], .id = "component") %>%
  round(digits = 3)
## # A tibble: 6 × 3
##     y_X  D1_X  Z1_X
##   <dbl> <dbl> <dbl>
## 1 0     0.095 0.187
## 2 0.147 0.256 0.087
## 3 0.355 0.555 0    
## 4 0     0     0.491
## 5 0.005 0     0    
## 6 0.493 0.094 0.235

So all of the algorithms are doing a bit of work here except for the fifth one.

Finally, remember that there is some randomness in these estimates due to the cross-fitting. So it’s good practice to replicate the procedure multiple times and account for the variance across replications when reporting standard errors. The ddml package doesn’t have built-in functionality for this, but it’s easy enough to make your own loop. Here I’ll loop ten times, but in practice you probably want to be a bit more thorough.

pliv_list <- lapply(seq_len(10), function(i) do.call(ddml_pliv, args))

Chernozhukov et al. (2018) recommend reporting the median with the standard errors adjusted as follows to account for randomness across replications.

lapply(pliv_list, function(r) {
  s <- summary(r)
  tibble(
    est = s["D_r", "Estimate", "nnls1"],
    se = s["D_r", "Std. Error", "nnls1"]
  )
}) %>%
  bind_rows() -> pliv
median(pliv$est) # Recommended point estimate
## [1] 0.1283299
sqrt(median(pliv$se^2 + (pliv$est - median(pliv$est))^2)) # Recommended SE
## [1] 0.05201638

In terms of the theory for heterogeneous treatment effects, this final estimate of 0.128 is one that can be said to be weakly causal with only nonparametric justification. Notice that it’s not much different than the original linear IV estimate in this case, but that doesn’t mean that the original linear IV estimate was somehow “correct.” Whether an estimand is weakly causal is about it’s underlying interpretation in terms of treatment effects. A non-negatively weighted estimand can be equal to a negatively-weighted one, but that doesn’t mean that they should be equally preferred.

4.3 Stata

The Stata package is also called ddml. It can be installed with ssc install ddml, but as of this writing that version is outdated, and it seems that some of the syntax has changed relative to what I show below. Install the more recent version this way instead:

net install ddml, ///
  from(https://raw.githubusercontent.com/aahrens1/ddml/master) ///
  replace

You also need to install the pystacked module with ssc install pystacked. This in turns needs to be able to find the Python module scikit-learn (e.g. pip install scikit-learn). Details are given here.

Once you’ve got everything running, the process of calling ddml is similar to R. Here’s sample code that tries to implement similar estimates to the ones for R above. The ML algorithms being called out to in the background are from different implementations (scikit-learn for Python instead of ranger and xgboost for R), which likely explains the small differences in results. I’ve tried to set the primary tuning parameters to be the same, but there are quite a few default ones, and some are probably slightly different. There are likely also small computational differences in how the algorithms are implemented.

## 
## . use card.dta, replace
## 
## . local Y lwage
## 
## . local D educ
## 
## . local Z nearc4
## 
## . local X south smsa smsa66 ///
## >   reg661 reg662 reg663 reg664 reg665 reg666 reg667 reg668 reg669 ///
## >   black exper expersq
## 
## . 
## . // Define six different learners
## . local l_xg1 method(gradboost) opt(max_depth(2) learning_rate(.3))
## 
## . local l_xg2 method(gradboost) opt(max_depth(6) learning_rate(.01))
## 
## . local l_xg3 method(gradboost) opt(max_depth(3) learning_rate(.05))
## 
## . local l_rf1 method(rf) opt(n_estimators(500) max_features(3))
## 
## . local l_rf2 method(rf) opt(n_estimators(500) max_features(3) max_depth(4))
## 
## . local l_rf3 method(rf) opt(n_estimators(500) max_features(5) max_depth(3))
## 
## . 
## . set seed 534023
## 
## . ddml init iv, reps(5) // You should set reps higher in practice
## 
## . ddml E[Y|X]: pystacked `Y' `X' || ///
## >              `l_xg1' || `l_xg2' || `l_xg3' || `l_rf1' || `l_rf2' || `l_rf3', 
## > ///
## >              type(reg) njobs(-1)
## Learner Y1_pystacked added successfully.
## 
## . ddml E[D|X]: pystacked `D' `X' || ///
## >              `l_xg1' || `l_xg2' || `l_xg3' || `l_rf1' || `l_rf2' || `l_rf3', 
## > ///
## >              type(reg) njobs(-1)
## Learner D1_pystacked added successfully.
## 
## . ddml E[Z|X]: pystacked `Z' `X' || ///
## >              `l_xg1' || `l_xg2' || `l_xg3' || `l_rf1' || `l_rf2' || `l_rf3', 
## > ///
## >              type(reg) njobs(-1)
## Learner Z1_pystacked added successfully.
## 
## . // Short-stack and do not use standard stacking for faster computation
## . ddml crossfit, shortstack nostdstack finalest(nnls1)
## Cross-fitting E[y|X] equation: lwage
## Resample 1Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 2...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 3...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 4...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 5...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Cross-fitting E[D|X] equation: educ
## Resample 1...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 2...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 3...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 4...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 5...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Cross-fitting E[Z|X]: nearc4
## Resample 1...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 2...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 3...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 4...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 5...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## 
## . ddml estimate, robust
## 
## 
## Model:                  iv, crossfit folds k=5, resamples r=5
## Mata global (mname):    m0
## Dependent variable (Y): lwage
##  lwage learners:        Y1_pystacked
## D equations (1):        educ
##  educ learners:         D1_pystacked
## Z equations (1):        nearc4
##  nearc4 learners:       Z1_pystacked
## 
## DDML estimation results:
## spec  r     Y learner     D learner         b        SE      Z learner
##   ss  1  [shortstack]          [ss]     0.121    (0.049)          [ss]
##   ss  2  [shortstack]          [ss]     0.120    (0.049)          [ss]
##   ss  3  [shortstack]          [ss]     0.118    (0.048)          [ss]
##   ss  4  [shortstack]          [ss]     0.116    (0.047)          [ss]
##   ss  5  [shortstack]          [ss]     0.118    (0.050)          [ss]
## 
## Mean/med    Y learner     D learner         b        SE      Z learner
##   ss mn  [shortstack]          [ss]     0.118    (0.049)          [ss]
##   ss md  [shortstack]          [ss]     0.118    (0.049)          [ss]
## 
## Shortstack DDML model (median over 5 resamples)
## y-E[y|X]  = y-Y_lwage_ss                           Number of obs   =      3010
## D-E[D|X]  = D-D_educ_ss 
## Z-E[Z|X]  = Z-Z_nearc4_ss 
## ------------------------------------------------------------------------------
##              |               Robust
##        lwage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
## -------------+----------------------------------------------------------------
##         educ |   .1178917   .0487495     2.42   0.016     .0223443     .213439
## ------------------------------------------------------------------------------
## Stacking final estimator: nnls1
## 
## Summary over 5 resamples:
##        D eqn      mean       min       p25       p50       p75       max
##         educ      0.1184    0.1156    0.1176    0.1179    0.1200    0.1208

The result ends up being similar to that found in R despite the small differences in implementation.

5 Estimating unconditional LATEs and ACRs

5.1 Motivation

The DDML PLIV estimate is weakly causal, which means that it is a non-negatively weighted average of ACRs for each covariate bin. The weights depend on the amount of treatment variation that the instrument induces in each covariate bin. The overall weighted average is difficult to interpret because it represents treatment effects for many different subpopulations all averaged together in ways that reflect more than just their proportion of the population. An easier quantity to interpret is the unconditional ACR, which is what one would be estimating if covariates were not needed to justify the instrument.

5.2 R

There are a few ways to estimate an unconditional ACR. I’ll focus on two that I think are relatively easy to implement.

The first is a different DDML estimator. The details are basically the same as in the previous section, except that the estimand is different and there are more unknown functions that need to be estimated nonparametrically. The syntax changes in only minor ways from the previous section.

acr_list <- lapply(seq_len(10), function(i) do.call(ddml_late, args))

Note that you’ll get a fair number of warnings about trimming here that can be ignored unless there are a very large number. (I’ve suppressed them in the output).

lapply(acr_list, function(r) {
  s <- summary(r)
  tibble(
    est = s["nnls1", "Estimate"],
    se = s["nnls1", "Std. Error"]
  )
}) %>%
  bind_rows() -> acr
median(acr$est) # Point estimate
## [1] 0.07659872
sqrt(median(acr$se^2 + (acr$est - median(acr$est))^2)) # SE
## [1] 0.05541654

Notice that this estimate is quite a bit different from both the linear IV and PLIV estimates, while having a similar standard error. The way in which the treatment effects are weighted across covariate groups evidently matters quite a bit here.

An alternative way to estimate the ACR is with an instrument propensity score weighting (IPSW) approach due to Tan (2006) and Frölich (2007). Słoczyński, Uysal, and Wooldridge (2024) propose a weight-normalized version of this estimator, arguing that the weight normalization is important for giving the estimator sensible invariance properties and consequently also for its statistical stability. The derivation of the estimator is exposited in Section 4.2.1 of our handbook chapter, but the short story is that it can be viewed as a ratio of two propensity score weighting estimators of the average treatment effect. In the numerator is the average treatment effect of the instrument on the outcome, which can be seen as a nonparametric counterpart to the usual reduced form. In the denominator is the average treatment effect of the instrument on the treatment, which can be seen as a nonparametric counterpart to the usual first stage. Both averages are taken across covariates. The ratio can be shown to be equal to the unconditional LATE when the treatment is binary or the unconditional ACR when the treatment is ordered. Note that a binary instrument remains important here.

I’m not aware of packages that implement this procedure in R, but it’s easy to implement by hand. First, we estimate the propensity score, which in context means the instrument propensity score, that is, the probability that the binary instrument is one given the covariates. I’ll do this with a logit model. Then, we use the fitted values from the logit model to constructed two weighted averages of the outcome and two weighted averages of the treatment. The ratio of the differences in these two weighted outcomes becomes the estimate of the LATE. Here’s the procedure as a function:

# IPSW = instrument propensity score weighting
ipsw <- function(data, f) {
  logit <- glm(data = data, f = f, family = binomial)
  data %>%
    mutate(
      Q = predict(logit, type = "response"), # instrument propensity score
      W1 = Z / Q,
      W0 = (1 - Z) / (1 - Q),
      num1 = Y * W1,
      num0 = Y * W0,
      den1 = D * W1,
      den0 = D * W0
    ) %>%
    summarize(across(c(W0, W1, num0, num1, den0, den1), sum)) -> x
  num <- (x$num1 / x$W1 - x$num0 / x$W0)
  den <- (x$den1 / x$W1 - x$den0 / x$W0)
  num / den
}
ipsw(card, paste("Z ~ ", covs_flist))
## [1] 0.07293566

Słoczyński, Uysal, and Wooldridge (2024) provide analytic standard error formulas, but they are quite complicated. Bootstrapping the standard errors is a reasonable alternative.

sapply(seq_len(500), function(i) {
  idx <- sample(nrow(card), nrow(card), replace = TRUE)
  cardbs <- card[idx, ]
  ipsw(cardbs, paste("Z ~ ", covs_flist))
}) %>%
  sd()
## [1] 0.04412163

This estimate is similar to the DDML ACR estimate with a similar standard error. It relies on the parametric assumption that we’ve correctly specified the probability of the instrument given the covariates as a logit. The DDML estimate by contrast is arguably fully nonparametric (given sufficiently expressive ML algorithms), but requires estimating a number of functions. Depending on your faith in nonparametrics, you might like this more or less. Regardless, the fact that both estimates are considerably different from the PLIV estimate should give pause to the idea that all non-negatively weighted averages are equally valuable.

5.3 Stata

Using ddml in Stata to estimate an unconditional ACR is quite similar to PLIV.

## 
## . use card.dta, replace
## 
## . local Y lwage
## 
## . local D educ
## 
## . local Z nearc4
## 
## . local X south smsa smsa66 ///
## >   reg661 reg662 reg663 reg664 reg665 reg666 reg667 reg668 reg669 ///
## >   black exper expersq
## 
## . 
## . local l_xg1 method(gradboost) opt(max_depth(2) learning_rate(.3))
## 
## . local l_xg2 method(gradboost) opt(max_depth(6) learning_rate(.01))
## 
## . local l_xg3 method(gradboost) opt(max_depth(3) learning_rate(.05))
## 
## . local l_rf1 method(rf) opt(n_estimators(500) max_features(3))
## 
## . local l_rf2 method(rf) opt(n_estimators(500) max_features(3) max_depth(4))
## 
## . local l_rf3 method(rf) opt(n_estimators(500) max_features(5) max_depth(3))
## 
## . 
## . set seed 534023
## 
## . ddml init interactiveiv, reps(5) // Set higher in practice
## 
## . ddml E[Y|X,Z]: pystacked `Y' `X' || ///
## >              `l_xg1' || `l_xg2' || `l_xg3' || `l_rf1' || `l_rf2' || `l_rf3', 
## > ///
## >              type(reg) njobs(-1)
## Learner Y1_pystacked added successfully.
## 
## . ddml E[D|X,Z]: pystacked `D' `X' || ///
## >              `l_xg1' || `l_xg2' || `l_xg3' || `l_rf1' || `l_rf2' || `l_rf3', 
## > ///
## >              type(reg) njobs(-1)
## Learner D1_pystacked added successfully.
## 
## . ddml E[Z|X]: pystacked `Z' `X' || ///
## >              `l_xg1' || `l_xg2' || `l_xg3' || `l_rf1' || `l_rf2' || `l_rf3', 
## > ///
## >              type(reg) njobs(-1)
## Learner Z1_pystacked added successfully.
## 
## . ddml crossfit, shortstack nostdstack finalest(nnls1)
## note: treatment (educ) = 1 in 1 cases when assignment (nearc4) = 0
## Cross-fitting E[y|X,Z] equation: lwage
## Resample 1Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 2...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 3...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 4...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 5...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Cross-fitting E[D|X,Z] equation: educ
## Resample 1...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 2...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 3...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 4...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 5...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Cross-fitting E[Z|X]: nearc4
## Resample 1...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 2...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 3...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 4...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## Resample 5...
## Cross-fitting fold 1 2 3 4 5 ...completed cross-fitting...completed short-stack
## > ing
## 
## . ddml estimate, robust
## 
## 
## Model:                  interactiveiv, crossfit folds k=5, resamples r=5
## Mata global (mname):    m0
## Dependent variable (Y): lwage
##  lwage learners:        Y1_pystacked
## D equations (1):        educ
##  educ learners:         D1_pystacked
## Z equations (1):        nearc4
##  nearc4 learners:       Z1_pystacked
## 
## DDML estimation results (LATE):
## spec  r  Y(0) learner  Y(1) learner  D(0) learner  D(1) learner         b      
## >   SE      Z learner
##   ss  1  [shortstack]          [ss]          [ss]          [ss]     0.069    (0
## > .043)          [ss]
##   ss  2  [shortstack]          [ss]          [ss]          [ss]     0.062    (0
## > .042)          [ss]
##   ss  3  [shortstack]          [ss]          [ss]          [ss]     0.080    (0
## > .043)          [ss]
##   ss  4  [shortstack]          [ss]          [ss]          [ss]     0.079    (0
## > .042)          [ss]
##   ss  5  [shortstack]          [ss]          [ss]          [ss]     0.084    (0
## > .045)          [ss]
## 
## Mean/med Y(0) learner  Y(1) learner  D(0) learner  D(1) learner         b      
## >   SE      Z learner
##   ss mn  [shortstack]          [ss]          [ss]          [ss]     0.075    (0
## > .044)          [ss]
##   ss md  [shortstack]          [ss]          [ss]          [ss]     0.079    (0
## > .044)          [ss]
## 
## Shortstack DDML model (median over 5 resamples) (LATE)
## E[y|X,Z=0]   = Y_lwage_ss0                         Number of obs   =      3010
## E[y|X,Z=1]   = Y_lwage_ss1
## E[D|X,Z=0]   = D_educ_ss0
## E[D|X,Z=1]   = D_educ_ss1
## E[Z|X]       = Z_nearc4_ss
## ------------------------------------------------------------------------------
##              |               Robust
##        lwage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
## -------------+----------------------------------------------------------------
##         educ |    .078724   .0441381     1.78   0.074    -.0077851    .1652331
## ------------------------------------------------------------------------------
## Stacking final estimator: nnls1
## Warning: 1 resamples had propensity scores trimmed to lower limit .01.
## 
## Summary over 5 resamples:
##        D eqn      mean       min       p25       p50       p75       max
##         educ      0.0749    0.0622    0.0689    0.0787    0.0805    0.0840

Implementing the IPSW estimator is easier in Stata than R thanks to Słoczyński, Uysal, and Wooldridge (2024), who provided a companion module with their paper. The module can be installed with ssc install kappalate. (The name kappalate is a reference to Abadie (2003)’s \(\kappa\), to which the argument has some relationship, although really it originates from a simpler argument due to Tan (2006) and Frölich (2007).) The module is quite easy to use and also provides analytic standard error estimates.

## 
## . use card.dta, replace
## 
## . local Y lwage
## 
## . local D educ
## 
## . local Z nearc4
## 
## . local X south smsa smsa66 ///
## >   reg661 reg662 reg663 reg664 reg665 reg666 reg667 reg668 reg669 ///
## >   black exper expersq
## 
## . 
## . kappalate `Y' (`D' = `Z') `X', zmodel(logit)
## 
## Weighting estimation of the LATE
## 
## Outcome     :   lwage
## Treatment   :   educ
## Instrument  :   nearc4
## IPS         :   logit ML
## 
## Number of obs   =   3010
## 
## ------------------------------------------------------------------------------
##        lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
## -------------+----------------------------------------------------------------
##        tau_u |   .0729357   .0430959     1.69   0.091    -.0115308    .1574022
## ------------------------------------------------------------------------------

Notice that the estimate is identical to the one I computed in R above. The analytic standard errors are similar to the bootstrapped standard errors.

6 Marginal treatment effects with a binary treatment

6.1 Motivation

Marginal treatment effect (MTE) methods use extrapolation to build estimates of quantities like the average treatment effect or average treatment on the treated that are not nonparametrically point identified under just exogeneity and monotonicity. See Section 4 of the handbook chapter for a thorough discussion. The most common implementation of the MTE idea is with a binary treatment, which is what we’ll explore here, since this is the only case for which software packages have been developed so far. Methods for multivalued treatments currently need to be coded up by hand. Section 4.5 of the handbook chapter provides some details for a linear regression based approach that extends the binary treatment case to ordered treatments.

6.2 Gelbach data

We illustrate MTE methods for a binary treatment using an extract of the 1980 Census constructed by Gelbach (2002) (link) that consists of single mothers whose youngest child was five years old in 1980. Gelbach used quarter of birth (quarter) as an instrument for public school enrollment (public) to estimate the effect of public school attendance on whether a mother worked (work79 in the following). The treatment is binary and the instrument is multivalued.

gelbach <- haven::read_dta("gelbach.dta")

6.3 R

The ivmte package developed by Shea and Torgovitsky (2023) can be used to implement MTE methods for a binary treatment. The package can be installed from CRAN, or directly from GitHub for the most up-to-date version by devtools::install_github("jkcshea/ivmte"). The following code shows how to use ivmte to estimate the average treatment effect through MTE extrapolation. The marginal treatment response (MTR) curves are specified as linear and additive in covariates.

library(ivmte)
covlist <- c(
  "white",
  "num612",
  "num1317",
  "numge18",
  "othrlt18",
  "othrge18",
  "grade",
  "centcity",
  "age",
  "age2"
)
gelbach$quarter <- as_factor(gelbach$quarter)

set.seed(393813)
f_covlist <- paste(covlist, collapse = " + ")
f_pscore <- as.formula(paste("public ~ quarter + ", f_covlist))
f_mtr <- as.formula(paste("~ u + ", f_covlist))
nbs <- 100
args <- list(
  data = gelbach,
  outcome = "work79",
  propensity = f_pscore,
  target = "ate",
  m0 = f_mtr,
  m1 = f_mtr,
  point = TRUE,
  bootstraps = nbs
)
r <- do.call(ivmte::ivmte, args)
r$point.estimate
## [1] 0.02864251
r$point.estimate.se
## [1] 0.02198205

The ivmte package contains much more functionality; for full details, see the repository or the accompanying paper. Here’s what it is doing under the hood in this simple example:

# This function estimate the MTE and the implied ATE
mte_ate <- function(data) {
  pscore <- glm(data = data, f = f_pscore, family = binomial)
  f_mtr <- as.formula(paste("work79 ~ fp + ", f_covlist))

  sapply(0:1, function(d) { # E[Y(d)] for d = 0,1
    # "fp" is function of p in the implied E[Y | D,X,Z]
    if (d == 1) {
      data$fp <- predict(pscore, type = "response") / 2
    } else {
      data$fp <- (1 + predict(pscore, type = "response")) / 2
    }
    # Regress Y on 1, fp, x stratified by treatment status
    lr <- lm(
      data = data %>% filter(public == d),
      formula = f_mtr
    )
    # Now use coefficient estimates to estimate E[Y(d)]
    data %>%
      select(all_of(covlist)) %>%
      mutate(fp = .5) %>%
      summarize(across(everything(), mean)) -> df_mean
    predict(lr, newdata = df_mean)
  }) -> eyd
  return(eyd[2] - eyd[1]) # Estimate of the ATE
}

mte_ate(gelbach) # point estimate matches that from ivmte
##          1 
## 0.02864251
sapply(seq_len(nbs), function(i) { # bootstrap by hand
  idx <- sample(nrow(gelbach), nrow(gelbach), replace = TRUE)
  dfbs <- gelbach[idx, ]
  mte_ate(dfbs)
}) %>%
  sd() # standard error is similar
## [1] 0.01952731

6.4 Stata

The mtefe module developed by Andresen (2018) is an excellent implementation of MTE for Stata. It also has a large number of options and functionality that is well-explained in Andresen’s Stata Journal paper. It can be installed with ssc install mtefe. Here’s how to reproduce the above estimates for the R code.

## 
## . use gelbach.dta, replace
## 
## . local X ///
## >   white num612 num1317 numge18 othrlt18 othrge18 grade centcity age age2
## 
## . 
## . mtefe work79 (public = i.quarter) `X', ///
## >   polynomial(1) /// Linear MTE/MTR curves
## >   separate /// We call this "stratified" in the handbook chapter
## >   link(logit) /// Propensity score estimates
## >   noplot /// Don't automatically make some (nice, but annoying) plots
## >   bootreps(10) // Make this larger in practice
## (running mtefe_secondstage on estimation sample)
## 
## Bootstrap replications (10): .........10 done
## 
## Parametric polynomial MTE model                         Number of obs = 10,932
##                                                         Replications  =     10
## 
## 
## Treatment model: Logit
## Estimation method: Separate approach
## ------------------------------------------------------------------------------
##              |   Observed   Bootstrap                         Normal-based
##       work79 | coefficient  std. err.      z    P>|z|     [95% conf. interval]
## -------------+----------------------------------------------------------------
## beta0        |
##        white |   .0344509   .0153112     2.25   0.024     .0044414    .0644603
##       num612 |  -.0786369   .0084532    -9.30   0.000    -.0952048    -.062069
##      num1317 |  -.0348928   .0085452    -4.08   0.000    -.0516411   -.0181445
##      numge18 |  -.0401763   .0306649    -1.31   0.190    -.1002783    .0199257
##     othrlt18 |  -.0505141   .0213663    -2.36   0.018    -.0923912    -.008637
##     othrge18 |   .0415301   .0177038     2.35   0.019     .0068314    .0762289
##        grade |     .03732   .0022416    16.65   0.000     .0329265    .0417135
##     centcity |  -.1157907   .0174442    -6.64   0.000    -.1499806   -.0816008
##          age |   .0453154   .0138904     3.26   0.001     .0180907    .0725401
##         age2 |  -.0007179   .0002182    -3.29   0.001    -.0011456   -.0002903
##        _cons |  -.3506762   .2142322    -1.64   0.102    -.7705636    .0692112
## -------------+----------------------------------------------------------------
## k0           |
##          p01 |   .1020681   .0306876     3.33   0.001     .0419216    .1622147
## -------------+----------------------------------------------------------------
## beta1-beta0  |
##        white |   .0077776   .0191604     0.41   0.685     -.029776    .0453312
##       num612 |   .0069878   .0099224     0.70   0.481    -.0124598    .0264354
##      num1317 |   .0039677   .0164701     0.24   0.810    -.0283131    .0362486
##      numge18 |   .0560184    .040358     1.39   0.165    -.0230819    .1351186
##     othrlt18 |   .0391709   .0258884     1.51   0.130    -.0115694    .0899113
##     othrge18 |  -.0057665   .0173465    -0.33   0.740     -.039765     .028232
##        grade |   .0053748   .0036421     1.48   0.140    -.0017636    .0125131
##     centcity |   -.021134   .0241298    -0.88   0.381    -.0684275    .0261595
##          age |  -.0068213   .0159143    -0.43   0.668    -.0380128    .0243701
##         age2 |   .0000893   .0002469     0.36   0.717    -.0003945    .0005732
##        _cons |    .078049   .2471243     0.32   0.752    -.4063057    .5624037
## -------------+----------------------------------------------------------------
## k            |
##           p1 |   .1148366   .0713048     1.61   0.107    -.0249182    .2545914
## -------------+----------------------------------------------------------------
## effects      |
##          ate |   .0286425    .014022     2.04   0.041     .0011599    .0561251
##          att |   .0120965   .0117039     1.03   0.301    -.0108427    .0350357
##         atut |   .0571107   .0293711     1.94   0.052    -.0004556    .1146771
##         late |    .038809   .0180068     2.16   0.031     .0035164    .0741017
##       mprte1 |   .0536539   .0265397     2.02   0.043     .0016371    .1056708
##       mprte2 |   .0438562   .0211552     2.07   0.038     .0023927    .0853196
##       mprte3 |   .0529657   .0262578     2.02   0.044     .0015013    .1044301
## ------------------------------------------------------------------------------
## Test of observable heterogeneity, p-value                               0.0000
## Test of essential heterogeneity, p-value                                0.1073
## ------------------------------------------------------------------------------

References

Abadie, Alberto. 2003. “Semiparametric Instrumental Variable Estimation of Treatment Response Models.” Journal of Econometrics 113 (2): 231–63.
Ahrens, Achim, Christian B. Hansen, Mark E. Schaffer, and Thomas Wiemann. 2023. “Ddml: Double/Debiased Machine Learning in Stata.” arXiv. https://doi.org/10.48550/arXiv.2301.09397.
———. 2024a. “Model Averaging and Double Machine Learning.” arXiv. https://doi.org/10.48550/arXiv.2401.01645.
Ahrens, Achim, Christian B Hansen, Mark E Schaffer, and Thomas Wiemann. 2024b. Ddml: Double/Debiased Machine Learning. Manual.
Andresen, Martin Eckhoff. 2018. “Exploring Marginal Treatment Effects: Flexible Estimation Using Stata.” The Stata Journal: Promoting Communications on Statistics and Stata 18 (1): 118–58. https://doi.org/10.1177/1536867X1801800108.
Angrist, Joshua D., and Guido W. Imbens. 1995. “Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity.” Journal of the American Statistical Association 90 (430): 431–42. http://www.jstor.org/stable/2291054.
Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
Bach, Philipp, Malte S. Kurz, Victor Chernozhukov, Martin Spindler, and Sven Klaassen. 2024. DoubleML: An Object-Oriented Implementation of Double Machine Learning in R.” Journal of Statistical Software 108 (3): 1–56. https://doi.org/10.18637/jss.v108.i03.
Blandhol, Christine, John Bonney, Magne Mogstad, and Alexander Torgovitsky. 2022. “When Is TSLS Actually LATE?” w29709. Cambridge, MA: National Bureau of Economic Research. https://doi.org/10.3386/w29709.
Card, David. 1993. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” National Bureau of Economic Research. https://doi.org/10.3386/w4483.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” The Econometrics Journal 21 (1): C1–68. https://doi.org/10.1111/ectj.12097.
Frölich, Markus. 2007. “Nonparametric IV Estimation of Local Average Treatment Effects with Covariates.” Journal of Econometrics, Endogeneity, instruments and identification, 139 (1): 35–75. https://doi.org/10.1016/j.jeconom.2006.06.004.
Gelbach, Jonah B. 2002. “Public Schooling for Young Children and Maternal Labor Supply.” The American Economic Review 92 (1): 307–22. http://www.jstor.org/stable/3083335.
Hastie, Trevor, Robert Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer.
Ramsey, J. B. 1969. “Tests for Specification Errors in Classical Linear Least-Squares Regression Analysis.” Journal of the Royal Statistical Society: Series B (Methodological) 31 (2): 350–71. https://doi.org/10.1111/j.2517-6161.1969.tb00796.x.
Shea, Joshua, and Alexander Torgovitsky. 2023. “Ivmte: An R Package for Extrapolating Instrumental Variable Estimates Away From Compliers*.” Observational Studies 9 (2): 1–42.
Słoczyński, Tymon, S. Derya Uysal, and Jeffrey M. Wooldridge. 2024. “Abadie’s Kappa and Weighting Estimators of the Local Average Treatment Effect.” Journal of Business & Economic Statistics, April, 1–14. https://doi.org/10.1080/07350015.2024.2332763.
Tan, Zhiqiang. 2006. “A Distributional Approach for Causal Inference Using Propensity Scores.” Journal of the American Statistical Association, December. https://doi.org/10.1198/016214506000000023.