TSRF: Two Stage Curvature Identification with Random Forest

View source: R/Source-Random-Forest.R

TSRFR Documentation

Two Stage Curvature Identification with Random Forest

Description

This function implements Two Stage Curvature Identification with the Random Forest. It tests the IV strength and chooses the violation space among a series of candidate spaces, and also constructs the confidence interval for the treatment effect with the selected violation space.

Usage

TSRF(
  Y,
  D,
  Z,
  X,
  vio.space = NULL,
  intercept = TRUE,
  A1.ind = NULL,
  num.trees = NULL,
  mtry = NULL,
  max.depth = NULL,
  min.node.size = NULL,
  str.thol = 10,
  alpha = 0.05
)

Arguments

Y

outcome with dimension n by 1

D

treatment with dimension n by 1

Z

instrument variable with dimension n by 1

X

baseline covariates with dimension n by p

vio.space

a matrix or a list. If a matrix, then each column corresponds to a violation form of Z; If a list, then each element corresponds to a violation form of Z and must be a matrix of n rows, e.g. (Z^3,Z^2); If NULL, then default by the n by 3 matrix (Z^3, Z^2, Z). Violation space selection will be performed according to provided violation space, for example, null violation space vs Z vs (Z^2, Z) vs (Z^3, Z^2, Z) in the default case

intercept

logic, including the intercept or not in the outcome model, default by TRUE

A1.ind

the indices of samples in A1, used for constructing the point estimator and the confidence interval, default by randomly selected round(2/3*n) samples from 1 to n

num.trees

number of trees in Random Forest, default by 200

mtry

number of covariates to possibly split at in each node of the tree in Random Forest, default by a sequence from round((p+1)/3) to round(2(p+1)/3)

max.depth

maximal tree depth in Random Forest, default by 0, which refers to unlimited depth

min.node.size

minimal size of each leaf node in Random Forest, default by the set 5, 10, 15

str.thol

minimal value of the threshold of IV strength test, default by 10

alpha

the significance level, default by 0.05

Value

Coef.all

a series of point estimators of treatment effect corresponding to different violation spaces and the OLS

sd.all

standard errors of Coef.all

CI.all

confidence intervals for the treatment effect corresponding to different violation spaces and the OLS

Coef.robust

the point estimators corresponding to the violation space selected by the robust comparison

sd.robust

the standard errors of Coef.robust

CI.robust

confidence intervals for the treatment effect with the violation space selected by the robust comparison

iv.str

IV strength corresponding to different violation spaces

iv.thol

the threshold of IV strength test corresponding to different violation spaces

Qmax

the index of largest violation space selected by IV strength test. If -1, the IV strength test fails for null violation space and run OLS. If 0, the IV Strength test fails for the null violation space and run TSRF only for null violation space. In other cases, violation space selection is performed

q.hat

the index of estimated violation space corresponding to Qmax

invalidity

invalidity of TSLS. If TRUE, the IV is invalid; Otherwise, the IV is valid

Examples

## Not run: 
# dimension
p = 10
# sample size
n = 100
# interaction value
inter.val = 1
# the IV strength
a = 1
# violation strength
tau = 1
f = function(x){a*(1*sin(2*pi*x) + 1.5*cos(2*pi*x))}
rho1=0.5
# function to generate covariance matrix
A1gen=function(rho,p){
  A1=matrix(0,p,p)
  for(i in 1:p){
    for(j in 1:p){
      A1[i,j]=rho^(abs(i-j))
    }
  }
  A1
}
Cov=(A1gen(rho1,p+1))
mu=rep(0,p+1)
# true effect
beta=1
alpha=as.matrix(rep(-0.3,p))
gamma=as.matrix(rep(0.2,p))
inter=as.matrix(c(rep(inter.val,5),rep(0,p-5)))


# generate the data
mu.error=rep(0,2)
Cov.error=matrix(c(1,0.5,0.5,1),2,2)
Error=mvrnorm(n, mu.error, Cov.error)
W.original=mvrnorm(n, mu, Cov)
W=pnorm(W.original)
# instrument variable
Z=W[,1]
# baseline covariates
X=W[,-1]
# generate the treatment variable D
D=f(Z)+X%*%alpha+Z*X%*%inter+Error[,1]
# generate the outcome variable Y
Y=D*beta+tau*Z+X%*%gamma+Error[,2]


# Two Stage Random Forest
output.RF = TSRF(Y,D,Z,X)
# point estimates
output.RF$Coef.robust
# standard errors
output.RF$sd.robust
# confidence intervals
output.RF$CI.robust

## End(Not run)


zijguo/TSCI documentation built on May 23, 2022, 1:07 a.m.