In this article, we discuss the R implementation of the test statistic for the test of conditional independence of $X, Y$ given $Z$. In the present version the codes provided in this page can only be used when $X$ and $Y$ are real valued and $Z \in \mathbb{R}^d$ for $d\geq 1$.

To download the R package use the following in R:

library(devtools)
devtools::install_github(repo = "rohitpatra/BrownCondInd",ref='main')
library(BrownCondInd)

In the following, we generate $n=100$ i.i.d. copies of $(X,Y,Z)$ when $d=2$. Moreover, we assume that $Z\sim N(0, \sigma_z^2 \textbf{I}_{5\times 5})$, $X=W+Z_1+\epsilon$, and $X=W+Z_1+\epsilon'$, where $Z_1$ is the first coordinate of $Z$ and $\epsilon, \epsilon^\prime,$ and $W$ are three independent mean zero Gaussian random variables. Moreover, we assume that $\epsilon, \epsilon^\prime,$ and $W$ are independent of $Z,$ and $var(\epsilon)=var(\epsilon^\prime)=\sigma^2_E,$ and $var(W)= \sigma^2_W$. Note that this is the simulation scenario used in Section 4 of @npres. Note that $X$ and $Y$ are conditionally independent of $Z$ only when $\sigma_W=0$. In the following, we fixed $\sigma_W$ to be $0.1$.

n <-100
d <- 2
sigma.Z <- 0.3
sigma.W <- 0.1
sigma.E <- 0.2
Z <- matrix(rnorm(n*d,0,sigma.Z),nr=n, nc=d)
W <- rnorm(n,0,sigma.W)
eps <- rnorm(n,0,sigma.E)
eps.prime <- rnorm(n,0,sigma.E)
X <- W + Z[,1] +eps
Y <- W + Z[,1] +eps.prime
Data <- cbind(X,Y,Z)
colnames(Data) <- c('X', 'Y', paste('Z.', 1:d,sep=''))
head(Data)

In the following we calculate the test statistic $\hat{\mathcal{E}}_n$, see (3.7) of @npres.

test.stat <- npresid.statistics(Data,d)

As the limiting behavior of $\hat{\mathcal{E}}_n$ is unknown, in the following we approximate the asymptotic distribution through a model based bootstrap procedure (see Section 3.2.1 of @npres) and evaluate the $p$-value of the proposed test. In the following ``boot.replic" denotes the number of bootstrap replications. We recommend using a bootstrap replication of size $1000.$

out <- npresid.boot(Data,d,boot.replic=100)
str(out, max.level = 1)

Here ``cond.dist.obj'' is the the list conataining estimators of $F_{X|Z}$, $F_{Y|Z}$, and $F_Z$ evaluated at the data points (denoted by F.x_z, F.y_z, and F.z_hat) and the bandwidth used (denoted by Fbw.x_z, Fbw.y_z, and Fbw.z_z) to evaluate the conditional distribution functions. Note that we use the functions available in the "np'' package ( see @np) to compute the optimal bandwidths as well as the estimates of the conditional distribution functions.

str(out$cond.dist.obj, max.level = 1)

The estimated $p$-value of the test procedure is given through

p.value <- out$p.value 

The required R file ``Npres_Fucntions.R'' can be downloaded here. Note the functions require the following R-packages: boot, data.table, energy, np, and stats.

# References



rohitpatra/BrownCondInd documentation built on Feb. 25, 2021, 3:03 p.m.