Skip to contents

A data frame of 100,000 observations drawn from a simulated Roy model with heterogenous treatment effects using simul_data().

Usage

data(roydata)

Format

The data contains the following information which would be observed in a standard dataset:

y

The observed outcome.

d

The treatment.

w0, w1

The semi-IVs entering only D=0 and D=1.

Xbinary, Xcontinuous

Two covariates, one binary and one continuous.

It also reports the typically unobserved potential outcomes and shocks:

y0, y1

The unobserved potential outcomes.

P

The unobserved true treatment probability.

latent, V, Ud, U0, U1

The unobserved shocks V. Ud is the normalized V ranks. U0 and U1 are the outcome shocks. latent gives the latent utility term in the selection equation.

The data was generated using the following R code:


N=100000; set.seed(1234)
model_type = "heterogenous"
param_error = c(1, 1, 0.6, 0.5)
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9)
param_p = c(0, -0.7, 0.7, 0, 0, 0)
param_y0 = c(3.2, 0.8, 0, 0)
param_y1 = c(3.2+0.4, 0.5, 0, 0)
param_genX = c(0.4, 0, 2)

roydata = simul_data(N, model_type, param_y0, param_y1,
                     param_p, param_Z, param_genX, param_error)