Estimator Performance: general heterogenous treatment effect model

Let us compare the performance of the semi-IV estimator in a generalized Roy model with heterogenous treatment effects. The simulation specification comes from the Appendix of Bruneel-Zupanc (2024) (Monte Carlo with Heterogenous treatment effect). It is close to the counterpart standard IV simulated Roy models used in James J. Heckman, Urzua, and Vytlacil (2006) or James J. Heckman and Vytlacil (2007).

Simulate data

We simulate generalized Roy models using the simul_data()function. See the documentation of the function for details about the model. Depending on the chosen parameters, we can simulate a model with homogenous/heterogenous treatment effects, as well as with valid IVs eventually. That’s what we will do here. In every simulation we do not include covariates (set all their effect to 0), but these can be easily included.

# Model
library(semiIVreg)
#> KernSmooth 2.23 loaded
#> Copyright M. P. Wand 1997-2009
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:data.table':
#> 
#>     yearmon, yearqtr
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
N = 50000; set.seed(1234)
# Specification
model_type = "heterogenous"
param_error = c(1, 1.5, 0.5, 1.5) # var_u0, var_u1, cov_u0u1, var_cost (the mean cost = constant in D*) # if heterogenous
param_Z = c(0, 0, 0, 0, 1, 0.8, 0.3) # meanW0 state0, meanW1 state0, meanW0 state1, meanW1 state1, varW0, varW1, covW0W1
param_p = c(-0.2, -1.2, 1, 0, 0, 0) # constant, alphaW0, alphaW1, alphaW0W1, effect of state, effect of parent educ
param_y0 = c(3.2, 1, 0, 0) # intercept, effect of Wd, effect of state, effect of parent educ;
param_y1 = c(3.2+0.4, 1.3, 0, 0) # the +0.2 = Average treatment effect; effect of W1, effect of state, effect of parent educ;
param_genX = c(0.4, 0, 2)

data = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)

semi-IV regression

Let us apply directly the semiivreg()function. Compute the MTE and MTR for a reference individuals with average value of the semi-IVs, i.e., $(W_0, W_1) = (0, 0)$ here. Remark: the $MTE(u, w_0, w_1, x)$ depend on $X$ and $W_0, W_1$ , so always need to pick a reference individual. By default, semiivreg computes the average individuals (for the continuous/binary covariates and semi-IVs), and takes the ‘reference level’ for factor variables.

In terms of estimation method, by default, semivreg() estimates the second-stage with local polynomial regression, in the spirit of the double residual regression for partially linear models of Robinson (1988). This is specified by using the default est_method="locpoly". This estimation as the advantage of being robust to misspecification of the control function $\kappa_d(u)$ functional form.

semiiv = semiivreg(y~d|w0|w1, data, ref_indiv = data.frame(w0=0, w1=0))
#> Caution: the standard errors around the plot are not correct (underestimated) because of the multiple stages. For proper standard errors, run the bootstrap in semiivreg_boot().

Let us report also the marginal treatment responses (MTR):

mte_plot = semiiv$plot$mte; 
mtr_plot = semiiv$plot$mtr;
mtr_plot

Attention: all the standard errors reported are wrong (too narrow)! They do not take into account the fact that the propensity score is estimated. For locpoly estimation (the default), these are simply the standard errors around $k_d(v)$ , they do not take into account the error in the effect of the covariates. To get proper standard errors, one should use the bootstrap.

Other options: to speed things up (especially useful in the first residual regression), can use fast_robinson1 = TRUE (for the 1st residual regression of $Y$ , $X$ and $W_d$ on $P$ ) and fast_robinson2 = TRUE (for the second residual regression). The fast_robinson uses the locpoly() function from KernSmooth package, which is much faster than the default routine we implemented. It has several drawbacks though: (i) only implemented for kernel="gaussian", (ii) does not compute standard errors around $k_d(v)$ estimates (but anyway, these are not completely correct), (iii) cannot use external weights for the data.

Direct effect of the semi-IVs.

Also estimates the effect of the semi-IV on their respective potential outcomes. To see these:

summary(semiiv$estimate$est0)
#> 
#> Call:
#> lm(formula = formuladx, data = residd, weights = weightsd)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.7603 -0.6498 -0.0060  0.6563  3.6457 
#> 
#> Coefficients:
#>    Estimate Std. Error t value Pr(>|t|)    
#> w0 0.996692   0.008595     116   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.9762 on 26778 degrees of freedom
#> Multiple R-squared:  0.3343, Adjusted R-squared:  0.3343 
#> F-statistic: 1.345e+04 on 1 and 26778 DF,  p-value: < 2.2e-16
summary(semiiv$estimate$est1);
#> 
#> Call:
#> lm(formula = formuladx, data = residd, weights = weightsd)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.5990 -0.7789 -0.0022  0.7614  4.3267 
#> 
#> Coefficients:
#>    Estimate Std. Error t value Pr(>|t|)    
#> w1 1.287222   0.009057   142.1   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.14 on 23220 degrees of freedom
#> Multiple R-squared:  0.4652, Adjusted R-squared:  0.4652 
#> F-statistic: 2.02e+04 on 1 and 23220 DF,  p-value: < 2.2e-16
# To be compared with:
param_y0[2]; param_y1[2]
#> [1] 1
#> [1] 1.3

Notice that these standard errors are biased because they do not take into account that the propensity score is estimated.

Standard errors.

To get proper standard errors of the MTE/MTR and effects of the semi-IVs with the default "locpoly"estimation method, use the bootstrap with the function semiivreg_boot(). This takes longer to estimate though.

Bandwidth Specification.

By default, with est_method="locpoly", if no bandwidth is provided, the bandwidth are computed using the bw_method. The default bw_method is simplistic: it picks the bandwidth as the specified fraction (1/5th) of the range of the support (rounded to the 3rd digit). You can set bw_method to any number (between 0 and 1) to pick the bandwidth as a function of the support automatically.

One can also implement optimal bandwidth selection methods, from the package nprobust (see Calonico, Cattaneo, and Farrell (2019)). In particular, bw_method = "mse-dpi" and ="mse-rot" implement the optimal (constant) bandwidth which minimizes the (integrated) mean squared error of the $k_1(v) - k_0(v)$ function, using either direct plug-in (dpi) or rule-of-thumb (rot) formula. For more details, see Calonico, Cattaneo, and Farrell (2019) or Fan and Gijbels (1996), or Wand and Jones (1994). Later updates of the package will allow for variable bandwidth (as already implemented in nprobust). Notice that in the second residual regression of Robinson, we want to estimate the derivative of the local polynomial function, so we find the optimal bandwidth for this derivative, hence the use of the plug-in and rule-of-thumb methods, which are well suited for this (see Fan and Gijbels (1996)). The estimation of the optimal bandwidth on large sample can take a long time (exponential increase with sample size). So, by default we specify bw_subsamp_size = 10000 such that the optimal bandwidth is computed on a “small” subsample of size 10,000. Requires to set.seed() before running semiivreg for replicability. Set it to NULL (or to some very large values) to compute the bandwidth on the full sample.

Alternatively, one can pre-specify some of the bandwidth directly, as shown below. The parameters bwd are the bandwidth for the first residual regression of $Y_d$ and $W_d$ (and $X$ ) on $\widehat{P}$ , that estimates respectively $E[Y_d|P]$ , $E[W_d|P]$ and $E[X|P]$ , in order to get the effects of the semi-IVs on their potential outcomes. For $W_d$ and $X$ , the order of the variable depends on the order specified in the original formula. Be sure to match the variables in the correct order (be careful with factor for example). One way to check is to first run without specifying the bandwidth and then checking the order of the variables in semiiv$estimate$est0 and semiiv$estimate$est1.
If one specifies only one value in bw0 and bw1, it will be applied to all the covariates.

bw_y0, bw_y1 are the bandwidth for the second residual regression, for $k_0(v)$ (MTR0) and $k_1(v)$ (MTR1) respectively. They are important since they govern the smoothness of the MTR, and thus of the MTE function. bw_y0 also serves as the bw for the local-IV estimation of the MTE directly.

Let us check the bandwidth from the previous computation (default was 1/5th of the support rule).

semiiv$bw
#> $bw0
#> [1] 0.199 0.199
#> 
#> $bw1
#> [1] 0.199 0.199
#> 
#> $bw_y0
#> [1] 0.199
#> 
#> $bw_y1
#> [1] 0.199
#> 
#> $bw_mte
#> [1] 0.199
#> 
#> $bw_method
#> [1] 0.2

Re-estimate the model with optimal bandwidth selection (mse-dpi rule) (print_progress=TRUE is a reporting option to see the progress of the function).

set.seed(1234)
semiiv = semiivreg(y~d|w0|w1, data, ref_indiv = data.frame(w0=0, w1=0),
                   bw_method = "mse-dpi", bw_subsamp_size=10000, 
                   print_progress=TRUE)
#> Estimating first stage... First stage estimated.       
#> 2nd stage: Estimating MTR and MTE... 
#> D= 0, Robinson 1st residual regression of X on P...               
#> Progress: 1/2                              Progress: 2/2                              D= 1, Robinson 1st residual regression of X on P...               
#> Progress: 1/2                              Progress: 2/2                              Robinson 2nd stage: Bandwidth Selection...                           Robinson 2nd stage: Estimation of k0(v) and k1(v)... 
#> Estimation complete.
#> Caution: the standard errors around the plot are not correct (underestimated) because of the multiple stages. For proper standard errors, run the bootstrap in semiivreg_boot().

semiiv$bw
#> $bw0
#> [1] 0.08202580 0.04809258
#> 
#> $bw1
#> [1] 0.1326332 0.1089366
#> 
#> $bw_y0
#> [1] 0.1261527
#> 
#> $bw_y1
#> [1] 0.1261527
#> 
#> $bw_mte
#> [1] 0.1261527
#> 
#> $bw_method
#> [1] "mse-dpi"

Now, illustrate how we can directly specify the bandwidth (which adjusts the smoothness of the estimation). Remark: the bw_method does not matter if we specify all the bandwidth.

semiiv = semiivreg(y~d|w0|w1, data, ref_indiv = data.frame(w0=0, w1=0),
                   bw0=0.05, bw1=0.05, bw_y0 = 0.126, bw_y1 = 0.126)
#> Caution: the standard errors around the plot are not correct (underestimated) because of the multiple stages. For proper standard errors, run the bootstrap in semiivreg_boot().

# Update the mtr and mte plots with the "optimal" bw 
mte_plot = semiiv$plot$mte; 
mtr_plot = semiiv$plot$mtr;

Polynomial degree.

One can also specify the degree of each local polynomial estimation with pol_degree_locpoly1 and pol_degree_locpoly2. Following Fan and Gijbels (1996) of setting the degree equal to the order of the derivative function we want to estimate $+1$ , by default we set pol_degree_locpoly1 = 1 because there we want to estimate a function directly, and pol_degree_locpoly2 = 2 because in the MTE/MTR stage we want to estimate derivatives of the control function $\kappa_d(u)$ .

Propensity Score estimation.

One can also extract the propensity score estimation. With a large number of observation, the fit is almost perfect and the bias due to the fact that $P$ is estimated will be very small.

firststage = semiiv$estimate$propensity
# Cannot be compared with param_p directly if V gets rescaled -> but can compare the predicted P with the truth
Phat = predict(firststage, newdata=data, type="response")

summary(Phat - data$P) # almost perfect; 
#>        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
#> -0.00145960 -0.00086220  0.00010782  0.00002379  0.00092685  0.00132015

semi-IV sieve regression

Another approach is to simply specify flexibly $\kappa_d(u)$ , with polynomials for example, in the spirit of sieve estimation. This is potentially less flexible (even though it still is), but as the advantage of being faster and giving analytical confidence intervals (biased because they do not take into account the fact that $P$ is estimated). To use this estimation method, specify est_method="sieve".

semiiv2 = semiivreg(y~d|w0|w1, data, ref_indiv = data.frame(w0=0, w1=0), 
                    est_method="sieve", pol_degree_sieve=3, 
                    plotting=FALSE)

mte_plot2 = semiiv2$plot$mte; mtr_plot2 = semiiv2$plot$mtr
grid.arrange(mte_plot2, mtr_plot2, ncol=2)

pol_degree_sieve controls the flexibility of the control function that is used, by controlling the degree of the polynomial used. By default we set it to $5$ .

Let us compare the two estimation methods results.

# If want to plot on the same plot, need some manipulation of the data

dat = semiiv$data$RES # take the original data
dat$V = dat$Phat

# for MTE:
mte_plot2 = mte_plot2 + geom_line(aes(x=V, y=mte), linetype="dashed", col="#db93a4", na.rm=TRUE, data=dat)


# for MTR need some manipulation
dat_plot = dat;
dat1 = dat_plot; dat1$mtr = dat_plot$mtr1; dat1$Treatment = 1
dat0 = dat_plot; dat0$mtr = dat_plot$mtr0; dat0$Treatment = 0;
dat2 = rbind(dat1, dat0)
dat2$Treatment = as.factor(dat2$Treatment)

mtr_plot2 = mtr_plot2 + geom_line(aes(x=V, y=mtr, col=Treatment, group=Treatment), linetype="dashed", na.rm=TRUE, data=dat2)

grid.arrange(mte_plot2, mtr_plot2, ncol=2)

Comparison with the truth

Let us compute the ‘true’ underlying MTE. Given the model specification, with $\begin{pmatrix} U_0 \\ U_1 \end{pmatrix} \sim N \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} \sigma_{U_0}^2 & \sigma_{U_0U_1} \\ \sigma_{U_0U_1} & \sigma_{U_1}^2 \end{pmatrix} \right)$ and $V = - (U_1 - U_0 - C) \text{ where } C \sim N(0, \sigma_C^2) \perp (U_0, U_1).$ Simple computation gives that $V \sim N(0, \sigma_{U_0}^2 + \sigma_{U_1}^2 - 2 \sigma_{U_0U_1} + \sigma_C^2)$ . Let us introduce $U_D = F_V(V)$ the uniform normalized $V$ shock. Now, $k_d(u) = \mathbb{E}[U_d | U_D=u]$ . So, we have that $k_d(u) = \mathbb{E}[U_d | V=F_V^{-1}(u)]$ . Given the specification above, $V$ and $U_d$ are bivariate normal and we have that: $\begin{aligned} k_0(u) &= \frac{\sigma_{U_0}^2 - \sigma_{U_0U_1}}{\sigma_V^2} \big(F_{V}^{-1}(u) - \mu_V \big), \\ k_1(u) &= \frac{-\sigma_{U_1}^2 + \sigma_{U_0U_1}}{\sigma_V^2} \big(F_{V}^{-1}(u) - \mu_V \big) \end{aligned}$ Then, the true MTR and MTE are given by $\begin{aligned} MTR_0(w_0, u) &= \delta_0 + w_0 \beta_0 + k_0(u), \\ MTR_1(w_1, u) &= \delta_1 + w_1 \beta_1 + k_1(u), \\ MTE(w_0, w_1, u) &= MTR_1(w_0, u) - MTR_0(w_1, u) \end{aligned}$

Evaluate the MTR and MTE at $(w_0, w_1) = (0, 0)$ .


# Underlying true MTE and MTR:
seq_p = seq(0, 1, by=0.001);
w0 = 0; w1 = 0;
sigma_V2 = param_error[1] + param_error[2] - 2*param_error[3] + param_error[4] # = var(V); var(data$V)
covU0V = param_error[1] - param_error[3] # = cov(data$U0, data$V)
covU1V = -param_error[2] + param_error[3] # = cov(data$U1, data$V)
ku0 = covU0V/sigma_V2*(qnorm(seq_p, mean=0, sd=sqrt(sigma_V2)) - 0) # 0 = mean(V)
ku1 = covU1V/sigma_V2*(qnorm(seq_p, mean=0, sd=sqrt(sigma_V2)) - 0) # 0 = mean(V)
true_mtr0 = param_y0[1] + param_y0[2]*w0 + ku0
true_mtr1 = param_y1[1] + param_y1[2]*w1 + ku1
true_mte = true_mtr1 - true_mtr0

newdata = data.frame(Ud=seq_p, w0=0, w1=0)
newdata$true_mte = true_mte; newdata$true_mtr1 = true_mtr1; newdata$true_mtr0 = true_mtr0


## # Remark: alternative estimation method if the truth does not have a simple closed form formula:
## data$diff = data$y1 - data$y0; pol_degree=5
## true_model_mte = lm(diff~w1 + w0 + poly(Ud, pol_degree, raw=TRUE), data); # MTE
## true_model_mtr1 = lm(y1 ~w1+poly(Ud, pol_degree, raw=TRUE), data) # MTR1
## true_model_mtr0 = lm(y0 ~w0+poly(Ud, pol_degree, raw=TRUE), data) # MTR0
## newdata$true_mte = predict(true_model_mte, newdata); 
## newdata$true_mtr1 = predict(true_model_mtr1, newdata); newdata$true_mtr0 = predict(true_model_mtr0, newdata)


# Comparison:
mte_plot2 = mte_plot2 + geom_line(data=newdata, aes(x=Ud, y=true_mte), linetype="dashed", col="red")
mtr_plot2 = mtr_plot2 + geom_line(data=newdata, aes(x=Ud, y=true_mtr1), linetype="dashed", col="blue") +
  geom_line(data=newdata, aes(x=Ud, y=true_mtr0), linetype="dashed", col="orange")

grid.arrange(mte_plot2, mtr_plot2, ncol=2)

Overall, we see that on the common support, the MTE are very precisely estimated here. What is remarkable is that we do not exploit the parametric knowledge of the underlying distribution of the shocks at all.

Compared to the homogenous treatment effect models, the estimation with heterogenous treatment effects requires more observations. Otherwise, the MTE are not well estimated at the tails of the common support. A solution is just to trim the estimation on the set on which the parameters are well identified then.

Other options

Trimming the Support.

These seems relatively well estimated, except at the tails where the common support is not entirely satisfied, while the MTE is only identified on the common support. If one wants to restrict the estimation to a given common support, it is very easy to do in semiivreg().

semiiv1 = semiivreg(y~d|w0|w1, data, 
                    ref_indiv = data.frame(w0=0, w1=0), 
                    common_supp_trim = c(0.10, 0.90),
                    plotting=FALSE)
#> Caution: the standard errors around the plot are not correct (underestimated) because of the multiple stages. For proper standard errors, run the bootstrap in semiivreg_boot().
mte_plot = semiiv1$plot$mte; mtr_plot = semiiv1$plot$mtr;
grid.arrange(mtr_plot, mte_plot, ncol=2)

Post-estimation prediction.

Imagine after estimating one wants to estimate the model for several individuals. One way is to directly specify several ref_indiv when running the initial regression. But if it’s already estimated, one can simply use the semiiv_predict function. It also allows to predict only at a subset of specific values of $V$ .

newdata = data.frame(w0=seq(-1, 1, by=0.5), w1=0) # Predict the outcome

pred = semiiv_predict(semiiv1, newdata=newdata)
head(pred$est) # the predicted values; -> can then be used to redo plots for example. 
#>     w0 w1 id  Phat     mtr0     mtr1      mte     mte2 mtr0_lwr mtr0_upr
#> 1   -1  0  1 0.100       NA       NA       NA       NA       NA       NA
#> 1.1 -1  0  1 0.101 1.912715       NA       NA 2.213390 1.494762 2.330669
#> 1.2 -1  0  1 0.102 1.913002 4.137639 2.224637 2.211793 1.498057 2.327946
#> 1.3 -1  0  1 0.103 1.913288 4.136595 2.223306 2.210195 1.501353 2.325224
#> 1.4 -1  0  1 0.104 1.913575 4.135551 2.221976 2.208598 1.504648 2.322502
#> 1.5 -1  0  1 0.105 1.913901 4.134507 2.220606 2.206981 1.507910 2.319893
#>     mtr1_lwr mtr1_upr mte2_lwr mte2_upr       k0       k1   deltak   delta0X
#> 1         NA       NA       NA       NA       NA       NA       NA -1.001563
#> 1.1       NA       NA 1.789696 2.637084 2.914278       NA 1.211827 -1.001563
#> 1.2 3.963550 4.311727 1.791001 2.632585 2.914565 4.137639 1.210230 -1.001563
#> 1.3 3.963395 4.309794 1.792306 2.628085 2.914852 4.136595 1.208632 -1.001563
#> 1.4 3.963240 4.307861 1.793610 2.623586 2.915138 4.135551 1.207035 -1.001563
#> 1.5 3.963085 4.305928 1.794824 2.619139 2.915465 4.134507 1.205418 -1.001563
#>     delta1X     se_k0      se_k1 se_deltak  se_delta0X se_delta1X
#> 1         0        NA         NA        NA 0.009050053          0
#> 1.1       0 0.2132349         NA 0.2161686 0.009050053          0
#> 1.2       0 0.2116998 0.08881748 0.2146880 0.009050053          0
#> 1.3       0 0.2101647 0.08836390 0.2132074 0.009050053          0
#> 1.4       0 0.2086296 0.08791032 0.2117268 0.009050053          0
#> 1.5       0 0.2071320 0.08745674 0.2102828 0.009050053          0
pred$deltaX # provides the shift in effect of X and Wd -> may be the only thing we care about, know that shifts the curve
#>     w0 w1 id    delta0X delta1X  se_delta0X se_delta1X
#> 1 -1.0  0  1 -1.0015633       0 0.009050053          0
#> 2 -0.5  0  2 -0.5007816       0 0.004525027          0
#> 3  0.0  0  3  0.0000000       0 0.000000000          0
#> 4  0.5  0  4  0.5007816       0 0.004525027          0
#> 5  1.0  0  5  1.0015633       0 0.009050053          0

References

Bruneel-Zupanc, Christophe. 2024. “Don’t (Fully) Exclude Me, It’s Not Necessary! Identification with Semi-IVs.” https://arxiv.org/abs/2303.12667.

Calonico, Sebastian, Matias D. Cattaneo, and Max H. Farrell. 2019. “Nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference.” Journal of Statistical Software 91 (8): 1–33. https://doi.org/10.18637/jss.v091.i08.

Fan, Jianqing., and Irène. Gijbels. 1996. Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. Boca Raton: Chapman; Hall/CRC.

Heckman, James J, Sergio Urzua, and Edward Vytlacil. 2006. “Understanding Instrumental Variables in Models with Essential Heterogeneity.” The Review of Economics and Statistics 88 (3): 389–432.

Heckman, James J., and Edward J. Vytlacil. 2007. “Chapter 71 Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast Their Effects in New Environments.” In, edited by James J. Heckman and Edward E. Leamer, 6:4875–5143. Handbook of Econometrics. Elsevier. https://doi.org/https://doi.org/10.1016/S1573-4412(07)06071-0.

Robinson, Peter M. 1988. “Root-n-Consistent Semiparametric Regression.” Econometrica: Journal of the Econometric Society, 931–54.

Wand, Matt P, and M Chris Jones. 1994. Kernel Smoothing. CRC press.