Skip to contents

A data frame of 100,000 observations drawn from a simulated Roy model with homogenous treatment effects using simul_data().

Usage

data(roydata2)

Format

The data contains the following information which would be observed in a standard dataset:

y

The observed outcome.

d

The treatment.

w0, w1

The semi-IVs entering only D=0 and D=1.

Xbinary, Xcontinuous

Two covariates, one binary and one continuous.

It also reports the typically unobserved potential outcomes and shocks:

y0, y1

The unobserved potential outcomes.

P

The unobserved true treatment probability.

latent, V, Ud, U0, U1

The unobserved shocks V. Ud is the normalized V ranks. U0 and U1 are the outcome shocks. latent gives the latent utility term in the selection equation.

The data was generated using the following R code:


N = 100000; set.seed(1234)
model_type = "homogenous"
param_error = c(1, 1.5, -0.6)
param_Z = c(0, 0, 0, 0, 1.5, 1.5, 0.9)
param_p = c(0, -0.5, 0.5, 0, 0, 0)
param_y0 = c(3.2, 0.8, 0, 0)
param_y1 = c(3.2+0.4, 0.5, 0, 0)
param_genX = c(0.4, 0, 2)

roydata2 = simul_data(N, model_type, param_y0, param_y1,
                      param_p, param_Z, param_genX, param_error)