In this article, we discuss the R implementation of the test statistic for the test of conditional independence of $X, Y$ given $Z$. In the present version the codes provided in this page can only be used when $X$ and $Y$ are real valued and $Z \in \mathbb{R}^d$ for $d\geq 1$.
To download the R package use the following in R:
library(devtools) devtools::install_github(repo = "rohitpatra/BrownCondInd",ref='main') library(BrownCondInd)
In the following, we generate $n=100$ i.i.d. copies of $(X,Y,Z)$ when $d=2$. Moreover, we assume that $Z\sim N(0, \sigma_z^2 \textbf{I}_{5\times 5})$, $X=W+Z_1+\epsilon$, and $X=W+Z_1+\epsilon'$, where $Z_1$ is the first coordinate of $Z$ and $\epsilon, \epsilon^\prime,$ and $W$ are three independent mean zero Gaussian random variables. Moreover, we assume that $\epsilon, \epsilon^\prime,$ and $W$ are independent of $Z,$ and $var(\epsilon)=var(\epsilon^\prime)=\sigma^2_E,$ and $var(W)= \sigma^2_W$. Note that this is the simulation scenario used in Section 4 of @npres. Note that $X$ and $Y$ are conditionally independent of $Z$ only when $\sigma_W=0$. In the following, we fixed $\sigma_W$ to be $0.1$.
n <-100 d <- 2 sigma.Z <- 0.3 sigma.W <- 0.1 sigma.E <- 0.2 Z <- matrix(rnorm(n*d,0,sigma.Z),nr=n, nc=d) W <- rnorm(n,0,sigma.W) eps <- rnorm(n,0,sigma.E) eps.prime <- rnorm(n,0,sigma.E) X <- W + Z[,1] +eps Y <- W + Z[,1] +eps.prime Data <- cbind(X,Y,Z) colnames(Data) <- c('X', 'Y', paste('Z.', 1:d,sep='')) head(Data)
In the following we calculate the test statistic $\hat{\mathcal{E}}_n$, see (3.7) of @npres.
test.stat <- npresid.statistics(Data,d)
As the limiting behavior of $\hat{\mathcal{E}}_n$ is unknown, in the following we approximate the asymptotic distribution through a model based bootstrap procedure (see Section 3.2.1 of @npres) and evaluate the $p$-value of the proposed test. In the following ``boot.replic" denotes the number of bootstrap replications. We recommend using a bootstrap replication of size $1000.$
out <- npresid.boot(Data,d,boot.replic=100) str(out, max.level = 1)
Here ``cond.dist.obj'' is the the list conataining estimators of $F_{X|Z}$, $F_{Y|Z}$, and $F_Z$ evaluated at the data points (denoted by F.x_z, F.y_z, and F.z_hat) and the bandwidth used (denoted by Fbw.x_z, Fbw.y_z, and Fbw.z_z) to evaluate the conditional distribution functions. Note that we use the functions available in the "np'' package ( see @np) to compute the optimal bandwidths as well as the estimates of the conditional distribution functions.
str(out$cond.dist.obj, max.level = 1)
The estimated $p$-value of the test procedure is given through
p.value <- out$p.value
The required R file ``Npres_Fucntions.R'' can be downloaded here. Note the functions require the following R-packages: boot, data.table, energy, np, and stats.
# References
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.