Simulate data from the Generalized Roy Model with semi-IVs

Usage

simul_data(N, model_type="heterogenous",
           param_y0, param_y1, param_p, param_Z, param_genX, param_error)

Arguments

N: Number of observations
model_type: Type of model: "heterogenous" or "homogenous"
param_y0: Parameters for Y0 = (delta0, beta0, beta0X1, beta0X2)
i.e., intercept, effects on w0, X_1, X_2 on Y0.
param_y1: Parameters for Y1: (delta1, beta1, beta1X1, beta1X2).
i.e., intercept, effects w1, X1, X2 on Y1.
param_p: Parameters for the selection: (alpha, alpha0, alpha1, alpha2, alphaX1, alphaX2) i.e., intercept and effects of w0, w1, w0w1, Xbinary, Xcontinuous on the latent utility.
param_Z: Parameters for the simulation of the semi-IVs:
mean of W0 when X1=0, of W1 when X1=0, of W0 when X1=1, of W1 when X1=1; then variance of W0, W1, and covariance of W0 and W1.
param_genX: Parameters for the covariates: p_X1, mu_X2, sigma_X2.
param_error: Parameters for the error terms: depends on model_type:
if heterogenous: variance of U0, U1, covariance of U0 and U1, variance of the cost (which has mean 0).
if homogenous: variance of U, variance of V, covariance of U and V.

Value

A data frame with the following columns:

y: The observed outcome.
d: The treatment.
w0, w1: The semi-IVs entering only D=0 and D=1.
Xbinary, Xcontinuous: Two covariates, one binary and one continuous.
y0, y1: The unobserved potential outcomes.
P: The unobserved true treatment probability.
latent, V, Ud, U0, U1: The unobserved shocks V. Ud is the normalized V ranks. U0 and U1 are the outcome shocks. latent gives the latent utility term in the selection equation.

Details

This function simulates data from the Generalized Roy Model with semi-IVs, following the simulation specified in Bruneel-Zupanc (2024).
For more details about the exact specification, see the vignette here or by running vignette("simul_data", package = "semiIVreg"). One can use it to simulate general model with heterogenous treatment effects, but also restricted ones with homogenous treatment effects.
simul_data was used to simulate the dataset available with this package, data(roydata) to obtain the simulated model with heterogenous treatment effect, and data(roydata2) to obtain the simulated model with homogenous treatment effect.

References

Bruneel-Zupanc, C. (2023). Don't (fully) exclude me, it's not necessary! Identification with semi-IVs. arXiv preprint arXiv:2303.12667.

Andresen, M. E. (2018). Exploring marginal treatment effects: Flexible estimation using Stata. The Stata Journal, 18(1), 118-158.

Heckman, J. J., Urzua, S., & Vytlacil, E. (2006). Understanding instrumental variables in models with essential heterogeneity. The Review of Economics and Statistics, 88(3), 389-432.

Heckman, J. J., & Vytlacil, E. J. (2007). Econometric evaluation of social programs, part II: Using the marginal treatment effect to organize alternative econometric estimators to evaluate social programs, and to forecast their effects in new environments. Handbook of econometrics, 6, 4875-5143.

Examples

N = 10000; set.seed(12345)

# Example 1: Heterogenous Treatment Effects.
model_type = "heterogenous"
param_error = c(1, 1, 0.6, 0.5) # var_u0, var_u1, cov_u0u1, var_cost
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9)
# meanW0 Xbinary0, meanW1 Xbinary0, meanW0 Xbinary1, meanW1 Xbinary1, varW0, varW1, covW0W1
param_p = c(0, -0.7, 0.7, 0, 0, 0) # constant, W0, W1, W0xW1, Xbinary, Xcontinuous
param_y0 = c(3.2, 0.8, 0, 0) # intercept, W0, Xbinary, Xcontinuous;
param_y1 = c(3.2+0.4, 0.5, 0, 0) # the +0.4 = ATE; W1, Xbinary, Xcontinuous;
param_genX = c(0.4, 0, 2)

data = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)


# Example 2: Homogenous Treatment Effects (constant MTE)
model_type = "homogenous"
param_error = c(1, 1.5, -0.6) # var_u, var_v, cov_uv
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9)
param_p = c(0, -0.5, 0.5, 0, 0, 0) # the constant <=> mean_V
param_y0 = c(3.2, 0.8, 0, 0)
param_y1 = c(3.2+0.4, 0.5, 0, 0)
param_genX = c(0.4, 0, 2)

data1 = simul_data(N, model_type, param_y0, param_y1, param_p, param_Z, param_genX, param_error)


# Set the effects of w1 or w0 on its outcome to zero if want a valid IV, e.g.,
# param_y1 = c(3.2+0.4, 0, 0, 0) # w1 is a valid IV
# or: param_y0 = c(3.2, 0, 0, 0) # w0 is a valid IV