Simulate data from the Generalized Roy Model with semi-IVs
simul_data.Rd
This function simulates data from the Generalized Roy Model with semi-IVs, following the simulation specified in Bruneel-Zupanc (2024).
For more details about the exact specification, see the vignettes here or by running vignette("simul_data", package = "semiIVreg")
.
Usage
simul_data(N, model_type="heterogenous",
param_y0, param_y1, param_p, param_Z, param_genX, param_error)
Arguments
- N
Number of observations
- model_type
Type of model: "heterogenous" or "homogenous"
- param_y0
Parameters for Y0 = (delta0, beta0, beta0X1, beta0X2)
i.e., intercept, effects on w0, X_1, X_2 on Y0.- param_y1
Parameters for Y1: (delta1, beta1, beta1X1, beta1X2).
i.e., intercept, effects w1, X1, X2 on Y1.- param_p
Parameters for the selection: (alpha, alpha0, alpha1, alpha2, alphaX1, alphaX2) i.e., intercept and effects of w0, w1, w0w1, Xbinary, Xcontinuous on the latent utility.
- param_Z
Parameters for the simulation of the semi-IVs:
mean of W0 when X1=0, of W1 when X1=0, of W0 when X1=1, of W1 when X1=1; then variance of W0, W1, and covariance of W0 and W1.- param_genX
Parameters for the covariates: p_X1, mu_X2, sigma_X2.
- param_error
Parameters for the error terms: depends on model_type:
if heterogenous: variance of U0, U1, covariance of U0 and U1, variance of the cost (which has mean 0).
if homogenous: variance of U, variance of V, covariance of U and V.
Value
A data frame with the following columns:
- y
The observed outcome.
- d
The treatment.
- w0, w1
The semi-IVs entering only D=0 and D=1.
- Xbinary, Xcontinuous
Two covariates, one binary and one continuous.
- y0, y1
The unobserved potential outcomes.
- P
The unobserved true treatment probability.
- latent, V, Ud, U0, U1
The unobserved shocks V. Ud is the normalized V ranks. U0 and U1 are the outcome shocks.
latent
gives the latent utility term in the selection equation.
Details
This function simulates data from the Generalized Roy Model with semi-IVs, following the simulation specified in Bruneel-Zupanc (2024).
For more details about the exact specification, see the vignette here or by running vignette("simul_data", package = "semiIVreg")
.
One can use it to simulate general model with heterogenous treatment effects, but also restricted ones with homogenous treatment effects. simul_data
was used to simulate the dataset available with this package, data(roydata)
to obtain the simulated model with heterogenous treatment effect, and data(roydata2)
to obtain the simulated model with homogenous treatment effect.
References
Bruneel-Zupanc, C. (2023). Don't (fully) exclude me, it's not necessary! Identification with semi-IVs. arXiv preprint arXiv:2303.12667.
Andresen, M. E. (2018). Exploring marginal treatment effects: Flexible estimation using Stata. The Stata Journal, 18(1), 118-158.
Heckman, J. J., Urzua, S., & Vytlacil, E. (2006). Understanding instrumental variables in models with essential heterogeneity. The Review of Economics and Statistics, 88(3), 389-432.
Heckman, J. J., & Vytlacil, E. J. (2007). Econometric evaluation of social programs, part II: Using the marginal treatment effect to organize alternative econometric estimators to evaluate social programs, and to forecast their effects in new environments. Handbook of econometrics, 6, 4875-5143.
Examples
N = 10000; set.seed(12345)
# Example 1: Heterogenous Treatment Effects.
model_type = "heterogenous"
param_error = c(1, 1, 0.6, 0.5) # var_u0, var_u1, cov_u0u1, var_cost
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9)
# meanW0 Xbinary0, meanW1 Xbinary0, meanW0 Xbinary1, meanW1 Xbinary1, varW0, varW1, covW0W1
param_p = c(0, -0.7, 0.7, 0, 0, 0) # constant, W0, W1, W0xW1, Xbinary, Xcontinuous
param_y0 = c(3.2, 0.8, 0, 0) # intercept, W0, Xbinary, Xcontinuous;
param_y1 = c(3.2+0.4, 0.5, 0, 0) # the +0.4 = ATE; W1, Xbinary, Xcontinuous;
param_genX = c(0.4, 0, 2)
data = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)
# Example 2: Homogenous Treatment Effects (constant MTE)
model_type = "homogenous"
param_error = c(1, 1.5, -0.6) # var_u, var_v, cov_uv
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9)
param_p = c(0, -0.5, 0.5, 0, 0, 0) # the constant <=> mean_V
param_y0 = c(3.2, 0.8, 0, 0)
param_y1 = c(3.2+0.4, 0.5, 0, 0)
param_genX = c(0.4, 0, 2)
data1 = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)
# Set the effects of w1 or w0 on its outcome to zero if want a valid IV, e.g.,
# param_y1 = c(3.2+0.4, 0, 0, 0) # w1 is a valid IV
# or: param_y0 = c(3.2, 0, 0, 0) # w0 is a valid IV