TSRF: Two Stage Curvature Identification with Random Forest
In zijguo/TSCI: Two Stage Curvature Idetification

TSRF	R Documentation

Two Stage Curvature Identification with Random Forest

Description

This function implements Two Stage Curvature Identification with the Random Forest. It tests the IV strength and chooses the violation space among a series of candidate spaces, and also constructs the confidence interval for the treatment effect with the selected violation space.

Usage

TSRF(
  Y,
  D,
  Z,
  X,
  vio.space = NULL,
  intercept = TRUE,
  A1.ind = NULL,
  num.trees = NULL,
  mtry = NULL,
  max.depth = NULL,
  min.node.size = NULL,
  str.thol = 10,
  alpha = 0.05
)

Arguments

`Y`	outcome with dimension n by 1
`D`	treatment with dimension n by 1
`Z`	instrument variable with dimension n by 1
`X`	baseline covariates with dimension n by p
`vio.space`	a matrix or a list. If a matrix, then each column corresponds to a violation form of Z; If a list, then each element corresponds to a violation form of Z and must be a matrix of n rows, e.g. (Z^3,Z^2); If NULL, then default by the n by 3 matrix (Z^3, Z^2, Z). Violation space selection will be performed according to provided violation space, for example, null violation space vs Z vs (Z^2, Z) vs (Z^3, Z^2, Z) in the default case
`intercept`	logic, including the intercept or not in the outcome model, default by TRUE
`A1.ind`	the indices of samples in A1, used for constructing the point estimator and the confidence interval, default by randomly selected round(2/3*n) samples from 1 to n
`num.trees`	number of trees in Random Forest, default by 200
`mtry`	number of covariates to possibly split at in each node of the tree in Random Forest, default by a sequence from round((p+1)/3) to round(2(p+1)/3)
`max.depth`	maximal tree depth in Random Forest, default by 0, which refers to unlimited depth
`min.node.size`	minimal size of each leaf node in Random Forest, default by the set 5, 10, 15
`str.thol`	minimal value of the threshold of IV strength test, default by 10
`alpha`	the significance level, default by 0.05

Value

`Coef.all`	a series of point estimators of treatment effect corresponding to different violation spaces and the OLS
`sd.all`	standard errors of Coef.all
`CI.all`	confidence intervals for the treatment effect corresponding to different violation spaces and the OLS
`Coef.robust`	the point estimators corresponding to the violation space selected by the robust comparison
`sd.robust`	the standard errors of Coef.robust
`CI.robust`	confidence intervals for the treatment effect with the violation space selected by the robust comparison
`iv.str`	IV strength corresponding to different violation spaces
`iv.thol`	the threshold of IV strength test corresponding to different violation spaces
`Qmax`	the index of largest violation space selected by IV strength test. If -1, the IV strength test fails for null violation space and run OLS. If 0, the IV Strength test fails for the null violation space and run TSRF only for null violation space. In other cases, violation space selection is performed
`q.hat`	the index of estimated violation space corresponding to Qmax
`invalidity`	invalidity of TSLS. If TRUE, the IV is invalid; Otherwise, the IV is valid

Examples

## Not run: 
# dimension
p = 10
# sample size
n = 100
# interaction value
inter.val = 1
# the IV strength
a = 1
# violation strength
tau = 1
f = function(x){a*(1*sin(2*pi*x) + 1.5*cos(2*pi*x))}
rho1=0.5
# function to generate covariance matrix
A1gen=function(rho,p){
  A1=matrix(0,p,p)
  for(i in 1:p){
    for(j in 1:p){
      A1[i,j]=rho^(abs(i-j))
    }
  }
  A1
}
Cov=(A1gen(rho1,p+1))
mu=rep(0,p+1)
# true effect
beta=1
alpha=as.matrix(rep(-0.3,p))
gamma=as.matrix(rep(0.2,p))
inter=as.matrix(c(rep(inter.val,5),rep(0,p-5)))


# generate the data
mu.error=rep(0,2)
Cov.error=matrix(c(1,0.5,0.5,1),2,2)
Error=mvrnorm(n, mu.error, Cov.error)
W.original=mvrnorm(n, mu, Cov)
W=pnorm(W.original)
# instrument variable
Z=W[,1]
# baseline covariates
X=W[,-1]
# generate the treatment variable D
D=f(Z)+X%*%alpha+Z*X%*%inter+Error[,1]
# generate the outcome variable Y
Y=D*beta+tau*Z+X%*%gamma+Error[,2]


# Two Stage Random Forest
output.RF = TSRF(Y,D,Z,X)
# point estimates
output.RF$Coef.robust
# standard errors
output.RF$sd.robust
# confidence intervals
output.RF$CI.robust

## End(Not run)

zijguo/TSCI documentation built on May 23, 2022, 1:07 a.m.