Estimator Performance: homogenous treatment effect model
Christophe Bruneel-Zupanc
Last modified: 2024-07-22
semiIVreg_homogenousTE.Rmd
Let us compare the performance of the semi-IV estimator under different violation of the standard IV assumptions. This note mimics the Homogenous Treatment Effect Appendix on Bruneel-Zupanc (2024).
Simulating data
We simulate generalized Roy models using the
simul_data()
function. See the documentation of the function
for details about the model. Depending on the chosen parameters, we can
simulate a model with homogenous/heterogenous treatment effects, as well
as with valid IVs eventually. That’s what we will do here. In every
simulation we do not include covariates (set all their effect to 0), but
these can be easily included.
Model with homogenous treatment effects
Simulate data
# Specification 1
library(semiIVreg)
#> KernSmooth 2.23 loaded
#> Copyright M. P. Wand 1997-2009
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:data.table':
#>
#> yearmon, yearqtr
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
N = 10000; set.seed(1234)
model_type = "homogenous"
param_error = c(1, 1, 0.6) # var_u, var_v, cov_uv # if homogenous
param_Z = c(0, 0, 0, 0, 1, 1, 0.5) # meanW0 state0, meanW1 state0, meanW0 state1, meanW1 state1, varW0, varW1, covW0W1
param_p = c(0, -0.7, 0.7, 0, 0, 0) # constant, alphaW0, alphaW1, alphaW0W1, effect of state, effect of parent educ
param_y0 = c(3.2, 0.8, 0, 0) # intercept, effect of Wd, effect of state, effect of parent educ;
param_y1 = c(3.2+0.4, 0.5, 0, 0) # the +0.2 = Average treatment effect; effect of W1, effect of state, effect of parent educ;
param_genX = c(0.4, 0, 2) # probability state=1 (instead of 0), mean_parenteduc, sd_parenteduc (parenteduc drawn as continuous)
# Remark: mean_V and the constant in the probit are playing the same role; normalize one to zero.
param = list(param_error, param_Z, param_p, param_y0, param_y1, param_genX, model_type)
data = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)
This is a model with homogenous Treatment Effects (conditional on ).
True unobserved homogenous Treatment Effects
Here, at , the effect is always 0.4 (see the parameters). To see this, let us just use the true unobserved potential outcomes (only observed thanks to the simulation).
data$true_TE = data$y1 - data$y0
summary(lm(true_TE ~ w0 + w1, data))
#>
#> Call:
#> lm(formula = true_TE ~ w0 + w1, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.065e-13 -2.090e-16 -2.000e-18 1.980e-16 1.602e-13
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.000e-01 1.956e-17 20453045425901668 <2e-16 ***
#> w0 -8.000e-01 2.248e-17 -35595035443326064 <2e-16 ***
#> w1 5.000e-01 2.259e-17 22130078856903196 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.956e-15 on 9997 degrees of freedom
#> Multiple R-squared: 1, Adjusted R-squared: 1
#> F-statistic: 6.463e+32 on 2 and 9997 DF, p-value: < 2.2e-16
true_param = c(param_y0[1], param_y1[1] - param_y0[1], param_y0[2], param_y1[2]); true_param
#> [1] 3.2 0.4 0.8 0.5
# constant y0, constant y1 - constant y0, effects of w0 on y0, effects of w1 on y1.
This is the true model that we would like to recover. Note that it is a perfect fit because it is exactly the same error term in both potential outcomes, . If we had an additional error term but uncorrelated with the rest, we would estimate the same coefficients (but with more noise).
Naive OLS
naive_ols = lm(y ~ d + w0 + w1, data); summary(naive_ols)
#>
#> Call:
#> lm(formula = y ~ d + w0 + w1, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.0172 -0.6411 -0.0147 0.6318 4.1403
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.83027 0.01410 271.58 <2e-16 ***
#> d -0.54983 0.02117 -25.98 <2e-16 ***
#> w0 0.20181 0.01185 17.04 <2e-16 ***
#> w1 0.45476 0.01189 38.24 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.9406 on 9996 degrees of freedom
#> Multiple R-squared: 0.3026, Adjusted R-squared: 0.3024
#> F-statistic: 1446 on 3 and 9996 DF, p-value: < 2.2e-16
Obviously, the naive OLS estimator is biased, because there is
endogeneity:
is correlated with
and
.
It overestimates the effect of the treatment
to be -1. This is because individuals with high
both select themself more into
and have a higher
.
Indeed
,
and
is negatively correlated with
(cf the covariance parameter in param_error
), so the higher
,
the lower
,
and the lower
,
the more likely one is to select
.
Wrongly assuming that the semi-IVs are valid IVs
What is we assume that and are valid IVs (i.e., that they have no direct effect on their respective potential outcomes)?
library(ivreg) # remark: ivreg is not required otherwise in semiivreg, only in this vignette.
valid_iv = ivreg(y ~ d | w0 + w1 + w0:w1, data=data); summary(valid_iv)
#>
#> Call:
#> ivreg(formula = y ~ d | w0 + w1 + w0:w1, data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.909987 -0.756706 -0.003831 0.746416 4.813215
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.56922 0.02676 133.387 <2e-16 ***
#> d -0.02044 0.04905 -0.417 0.677
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments 3 9996 886.4 <2e-16 ***
#> Wu-Hausman 1 9997 96.0 <2e-16 ***
#> Sargan 2 NA 2604.4 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.124 on 9998 degrees of freedom
#> Multiple R-Squared: 0.003449, Adjusted R-squared: 0.003349
#> Wald test: 0.1736 on 1 and 9998 DF, p-value: 0.6769
The estimated coefficients is also completely biased here, because it assumes the semi-IVs have no effect on the outcomes while they do.
semi-IV estimation with semiivreg
Let us now see how the semi-IV estimator performs. To specify the
estimation with homogenous treatment effects, we specify
est_method="homogenous"
. This performs a sieve-like
estimation but with an additional constraint which restrict the MTE to
be constant by imposing constraints on the control function
and
.
semiiv = semiivreg(y~d|w0|w1, data, est_method="homogenous", plotting=FALSE)
summary_coeff = semiiv$estimate$est # extract the coefficients from the homogenous TE specification
summary_coeff[1:4,] # only print the first 4 coefficients, the other correspond to the control function of P
#> Variable Estimate Std_Error t_value p_value
#> 1 (Intercept) 3.2453691 0.02408485 134.747305 0.000000e+00
#> 2 I(d) 0.3255329 0.04497677 7.237801 4.895871e-13
#> 3 I(1 - d):w0 0.7834206 0.01431262 54.736343 0.000000e+00
#> 4 I(d):w1 0.5194180 0.01432695 36.254620 1.689759e-270
true_param
#> [1] 3.2 0.4 0.8 0.5
The estimated coefficients are very close to the truth, with relatively small standard errors here, even though the sample size is modest (10000 observations).
Be cautious though: the standard errors are computed without
taking into account the fact that the propensity score is estimated in a
first stage. We can correct this estimation with
semiivreg_boot()
. But typically, if the first stage is well
estimated, the bias in the standard errors is small, as visible
below.
semiivboot = semiivreg_boot(y~d|w0|w1, data, Nboot=200, est_method="homogenous", plotting=FALSE) # reduce the number of bootstrap simulation for speed;
#> Bandwidth and MTR/MTE estimation on main sample...
#> Estimating first stage... First stage estimated.
#> 2nd stage: Estimating MTR and MTE...
#> Estimation complete.
#>
#> Bandwidth and MTR/MTE estimation on main sample: Done.
#> Bootstrap Progress: 1/200 Bootstrap Progress: 2/200 Bootstrap Progress: 3/200 Bootstrap Progress: 4/200 Bootstrap Progress: 5/200 Bootstrap Progress: 6/200 Bootstrap Progress: 7/200 Bootstrap Progress: 8/200 Bootstrap Progress: 9/200 Bootstrap Progress: 10/200 Bootstrap Progress: 11/200 Bootstrap Progress: 12/200 Bootstrap Progress: 13/200 Bootstrap Progress: 14/200 Bootstrap Progress: 15/200 Bootstrap Progress: 16/200 Bootstrap Progress: 17/200 Bootstrap Progress: 18/200 Bootstrap Progress: 19/200 Bootstrap Progress: 20/200 Bootstrap Progress: 21/200 Bootstrap Progress: 22/200 Bootstrap Progress: 23/200 Bootstrap Progress: 24/200 Bootstrap Progress: 25/200 Bootstrap Progress: 26/200 Bootstrap Progress: 27/200 Bootstrap Progress: 28/200 Bootstrap Progress: 29/200 Bootstrap Progress: 30/200 Bootstrap Progress: 31/200 Bootstrap Progress: 32/200 Bootstrap Progress: 33/200 Bootstrap Progress: 34/200 Bootstrap Progress: 35/200 Bootstrap Progress: 36/200 Bootstrap Progress: 37/200 Bootstrap Progress: 38/200 Bootstrap Progress: 39/200 Bootstrap Progress: 40/200 Bootstrap Progress: 41/200 Bootstrap Progress: 42/200 Bootstrap Progress: 43/200 Bootstrap Progress: 44/200 Bootstrap Progress: 45/200 Bootstrap Progress: 46/200 Bootstrap Progress: 47/200 Bootstrap Progress: 48/200 Bootstrap Progress: 49/200 Bootstrap Progress: 50/200 Bootstrap Progress: 51/200 Bootstrap Progress: 52/200 Bootstrap Progress: 53/200 Bootstrap Progress: 54/200 Bootstrap Progress: 55/200 Bootstrap Progress: 56/200 Bootstrap Progress: 57/200 Bootstrap Progress: 58/200 Bootstrap Progress: 59/200 Bootstrap Progress: 60/200 Bootstrap Progress: 61/200 Bootstrap Progress: 62/200 Bootstrap Progress: 63/200 Bootstrap Progress: 64/200 Bootstrap Progress: 65/200 Bootstrap Progress: 66/200 Bootstrap Progress: 67/200 Bootstrap Progress: 68/200 Bootstrap Progress: 69/200 Bootstrap Progress: 70/200 Bootstrap Progress: 71/200 Bootstrap Progress: 72/200 Bootstrap Progress: 73/200 Bootstrap Progress: 74/200 Bootstrap Progress: 75/200 Bootstrap Progress: 76/200 Bootstrap Progress: 77/200 Bootstrap Progress: 78/200 Bootstrap Progress: 79/200 Bootstrap Progress: 80/200 Bootstrap Progress: 81/200 Bootstrap Progress: 82/200 Bootstrap Progress: 83/200 Bootstrap Progress: 84/200 Bootstrap Progress: 85/200 Bootstrap Progress: 86/200 Bootstrap Progress: 87/200 Bootstrap Progress: 88/200 Bootstrap Progress: 89/200 Bootstrap Progress: 90/200 Bootstrap Progress: 91/200 Bootstrap Progress: 92/200 Bootstrap Progress: 93/200 Bootstrap Progress: 94/200 Bootstrap Progress: 95/200 Bootstrap Progress: 96/200 Bootstrap Progress: 97/200 Bootstrap Progress: 98/200 Bootstrap Progress: 99/200 Bootstrap Progress: 100/200 Bootstrap Progress: 101/200 Bootstrap Progress: 102/200 Bootstrap Progress: 103/200 Bootstrap Progress: 104/200 Bootstrap Progress: 105/200 Bootstrap Progress: 106/200 Bootstrap Progress: 107/200 Bootstrap Progress: 108/200 Bootstrap Progress: 109/200 Bootstrap Progress: 110/200 Bootstrap Progress: 111/200 Bootstrap Progress: 112/200 Bootstrap Progress: 113/200 Bootstrap Progress: 114/200 Bootstrap Progress: 115/200 Bootstrap Progress: 116/200 Bootstrap Progress: 117/200 Bootstrap Progress: 118/200 Bootstrap Progress: 119/200 Bootstrap Progress: 120/200 Bootstrap Progress: 121/200 Bootstrap Progress: 122/200 Bootstrap Progress: 123/200 Bootstrap Progress: 124/200 Bootstrap Progress: 125/200 Bootstrap Progress: 126/200 Bootstrap Progress: 127/200 Bootstrap Progress: 128/200 Bootstrap Progress: 129/200 Bootstrap Progress: 130/200 Bootstrap Progress: 131/200 Bootstrap Progress: 132/200 Bootstrap Progress: 133/200 Bootstrap Progress: 134/200 Bootstrap Progress: 135/200 Bootstrap Progress: 136/200 Bootstrap Progress: 137/200 Bootstrap Progress: 138/200 Bootstrap Progress: 139/200 Bootstrap Progress: 140/200 Bootstrap Progress: 141/200 Bootstrap Progress: 142/200 Bootstrap Progress: 143/200 Bootstrap Progress: 144/200 Bootstrap Progress: 145/200 Bootstrap Progress: 146/200 Bootstrap Progress: 147/200 Bootstrap Progress: 148/200 Bootstrap Progress: 149/200 Bootstrap Progress: 150/200 Bootstrap Progress: 151/200 Bootstrap Progress: 152/200 Bootstrap Progress: 153/200 Bootstrap Progress: 154/200 Bootstrap Progress: 155/200 Bootstrap Progress: 156/200 Bootstrap Progress: 157/200 Bootstrap Progress: 158/200 Bootstrap Progress: 159/200 Bootstrap Progress: 160/200 Bootstrap Progress: 161/200 Bootstrap Progress: 162/200 Bootstrap Progress: 163/200 Bootstrap Progress: 164/200 Bootstrap Progress: 165/200 Bootstrap Progress: 166/200 Bootstrap Progress: 167/200 Bootstrap Progress: 168/200 Bootstrap Progress: 169/200 Bootstrap Progress: 170/200 Bootstrap Progress: 171/200 Bootstrap Progress: 172/200 Bootstrap Progress: 173/200 Bootstrap Progress: 174/200 Bootstrap Progress: 175/200 Bootstrap Progress: 176/200 Bootstrap Progress: 177/200 Bootstrap Progress: 178/200 Bootstrap Progress: 179/200 Bootstrap Progress: 180/200 Bootstrap Progress: 181/200 Bootstrap Progress: 182/200 Bootstrap Progress: 183/200 Bootstrap Progress: 184/200 Bootstrap Progress: 185/200 Bootstrap Progress: 186/200 Bootstrap Progress: 187/200 Bootstrap Progress: 188/200 Bootstrap Progress: 189/200 Bootstrap Progress: 190/200 Bootstrap Progress: 191/200 Bootstrap Progress: 192/200 Bootstrap Progress: 193/200 Bootstrap Progress: 194/200 Bootstrap Progress: 195/200 Bootstrap Progress: 196/200 Bootstrap Progress: 197/200 Bootstrap Progress: 198/200 Bootstrap Progress: 199/200 Bootstrap Progress: 200/200
boot_se = semiivboot$estimate$coeff$std_error[1:4]
res = as.data.frame(cbind(summary_coeff[1:4,1:3], boot_se)); colnames(res) = c("Variable", "Estimate", "wrong analytic SE", "Bootstrapped SE")
res
#> Variable Estimate wrong analytic SE Bootstrapped SE
#> 1 (Intercept) 3.2453691 0.02408485 0.02632599
#> 2 I(d) 0.3255329 0.04497677 0.04930933
#> 3 I(1 - d):w0 0.7834206 0.01431262 0.01379810
#> 4 I(d):w1 0.5194180 0.01432695 0.01631556
Alternative semi-IV strategy: based on IV-quantile regression
As described in Bruneel-Zupanc (2024),
there is another general nonparametric identification strategy with
semi-IV, building on the IV-quantile regression (IVQR) framework of
Chernozhukov and Hansen (2005). In the
model with homogenous treatment effect, this strategy requires that some
interaction
is relevant for the selection into treatment. The intuition is that this
interaction now serves as IV for the treatment
within this framework. The nice feature of this strategy is that it can
be implemented with standard IV estimation, e.g., ivreg()
in R.
semiivqr = ivreg(y~d+I(1-d):w0 + I(d):w1|w0+w1+w0:w1, data=data)
summary(semiivqr)
#> Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
#> arithmetic operators in their names;
#> the printed representation of the hypothesis will be omitted
#>
#> Call:
#> ivreg(formula = y ~ d + I(1 - d):w0 + I(d):w1 | w0 + w1 + w0:w1,
#> data = data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -21.7424 -4.4342 -0.0106 4.5084 23.1987
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 8.663 18.635 0.465 0.642
#> d -10.561 37.440 -0.282 0.778
#> I(1 - d):w0 -4.091 16.779 -0.244 0.807
#> I(d):w1 5.428 16.869 0.322 0.748
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments (d) 3 9996 886.4 <2e-16 ***
#> Weak instruments (I(1 - d):w0) 3 9996 3691.7 <2e-16 ***
#> Weak instruments (I(d):w1) 3 9996 3560.3 <2e-16 ***
#> Wu-Hausman 3 9993 107.9 <2e-16 ***
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 6.212 on 9996 degrees of freedom
#> Multiple R-Squared: -29.42, Adjusted R-squared: -29.43
#> Wald test: 28.43 on 3 and 9996 DF, p-value: < 2.2e-16
In this baseline model, the interaction is not significant in the first stage, so we have a problem of weak IVs. This is not the case in this baseline model, so the estimation blow up because the IV is irrelevant.
However, if we specify an alternative model where the interaction is indeed relevant, we will also estimate correctly the homogenous treatment effect using this strategy. This is what we do now.
# Model 2
N = 10000; set.seed(1234)
model_type = "homogenous"
param_error = c(1, 1.5, -0.6) # var_u, var_v, cov_uv # if homogenous
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9) # meanW0 state0, meanW1 state0, meanW0 state1, meanW1 state1, varW0, varW1, covW0W1
param_p = c(0, -0.3, 0.4, 0.3, 0, 0) # constant, alphaW0, alphaW1, alphaW0W1, effect of state, effect of parent educ
param_y0 = c(3.2, 0.8, 0, 0) # intercept, effect of Wd, effect of state, effect of parent educ;
param_y1 = c(3.2+0.4, 0.5, 0, 0) # the +0.2 = Average treatment effect; effect of W1, effect of state, effect of parent educ;
param_genX = c(0.4, 0, 2) # probability state=1 (instead of 0), mean_parenteduc, sd_parenteduc (parenteduc drawn as continuous)
data2 = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)
This is a model param_p[4]
gives the effect of
on
.
It is different from 0 here. So the IVQR strategy should work.
semiivqr2 = ivreg(y~d+I(1-d):w0 + I(d):w1|w0+w1+w0:w1, data=data2); summary(semiivqr2)
#> Warning in printHypothesis(L, rhs, names(b)): one or more coefficients in the hypothesis include
#> arithmetic operators in their names;
#> the printed representation of the hypothesis will be omitted
#>
#> Call:
#> ivreg(formula = y ~ d + I(1 - d):w0 + I(d):w1 | w0 + w1 + w0:w1,
#> data = data2)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.590688 -0.667083 -0.006262 0.664900 3.860807
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.18552 0.05274 60.405 < 2e-16 ***
#> d 0.42566 0.09107 4.674 0.00000299 ***
#> I(1 - d):w0 0.80693 0.04120 19.586 < 2e-16 ***
#> I(d):w1 0.49942 0.01849 27.004 < 2e-16 ***
#>
#> Diagnostic tests:
#> df1 df2 statistic p-value
#> Weak instruments (d) 3 9996 411.88 <2e-16 ***
#> Weak instruments (I(1 - d):w0) 3 9996 1645.20 <2e-16 ***
#> Weak instruments (I(d):w1) 3 9996 7846.17 <2e-16 ***
#> Wu-Hausman 3 9993 42.32 <2e-16 ***
#> Sargan 0 NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.9964 on 9996 degrees of freedom
#> Multiple R-Squared: 0.4283, Adjusted R-squared: 0.4281
#> Wald test: 1385 on 3 and 9996 DF, p-value: < 2.2e-16
Indeed, it estimates the coefficients well. Notice that the
semiivreg()
function also works there:
semiiv2 = semiivreg(y~d|w0|w1, data=data2, est_method="homogenous", plotting=FALSE)
summary_coeff2 = semiiv2$estimate$est # extract the coefficients from the homogenous TE specification
summary_coeff2[1:4,] # only print the first 4 coefficients, the other correspond to the control function of P
#> Variable Estimate Std_Error t_value p_value
#> 1 (Intercept) 3.1671559 0.05868908 53.964995 0.000000000000
#> 2 I(d) 0.4551946 0.10268723 4.432826 0.000009399856
#> 3 I(1 - d):w0 0.8244073 0.01517866 54.313584 0.000000000000
#> 4 I(d):w1 0.4952332 0.01192557 41.527014 0.000000000000
true_param
#> [1] 3.2 0.4 0.8 0.5
Thus semiivreg()
is more flexible and should be used in
any case. Notice that, in general the IVQR estimation will also have a
higher variance of the estimate because it builds on the interaction as
an IV so it requires it to have a large significant effects on the
treatment to avoid weak IV concerns. While the semiivreg()
only relies on the fact that each semi-IV is relevant by itself, which
is relatively easier to satisfy. Especially if the semi-IVs have a
strong impact on their respective potential outcome, they should be
relevant as soon as there is selection into treatment based on gains
(i.e.,
).
Remark: since we know the first stage includes the interaction, we might as well include it in semiivreg as follows (but the previous results not including where still ok, despite the bias in prediction of the propensity score because of the wrong formula)
semiiv3 = semiivreg(y~d|w0|w1, propensity_formula = d~w0+w1+w0:w1,
data=data2, est_method="homogenous", plotting=FALSE)
fstage = semiiv3$estimate$propensity; summary(fstage) # returns the first stage estimation
#>
#> Call:
#> glm(formula = propensity_formula, family = binomial(link = "probit"),
#> data = propensity_data, weights = WEIGHTS)
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.01376 0.01483 -0.927 0.354
#> w0 -0.23785 0.01392 -17.089 <2e-16 ***
#> w1 0.32054 0.01424 22.517 <2e-16 ***
#> w0:w1 0.25106 0.01025 24.482 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 13672 on 9999 degrees of freedom
#> Residual deviance: 12385 on 9996 degrees of freedom
#> AIC: 12393
#>
#> Number of Fisher Scoring iterations: 4
summary_coeff3 = semiiv3$estimate$est # extract the coefficients from the homogenous TE specification
summary_coeff3[1:4,] # only print the first 4 coefficients, the other correspond to the control function of P
#> Variable Estimate Std_Error t_value p_value
#> 1 (Intercept) 3.1768312 0.03566453 89.075380 0.000000e+00
#> 2 I(d) 0.4403100 0.06103333 7.214254 5.816986e-13
#> 3 I(1 - d):w0 0.8227038 0.01466641 56.094412 0.000000e+00
#> 4 I(d):w1 0.4985490 0.01020502 48.853314 0.000000e+00
true_param
#> [1] 3.2 0.4 0.8 0.5
Specification 2: semi-IVs are in fact valid IVs
To run the specification 2 of Bruneel-Zupanc (2024), where the semi-IVs are in fact valid IVs, we need to change the parameters of the model. We only need to remove (set to zero) the direct effect of the semi-IVs on their potential outcomes.
# Specification 2
param_y0 = c(3.2, 0, 0, 0) # intercept, effect of Wd, effect of state, effect of parent educ;
param_y1 = c(3.2+0.4, 0, 0, 0) # the +0.2 = Average treatment effect; effect of W1, effect of state, effect of parent educ;
param = list(param_error, param_Z, param_p, param_y0, param_y1, param_genX, model_type)
Then, simply run the previous code. To replicate the Monte Carlo simulations of the paper, run it several times.