Conditional independence test

Description

Performs a conditional independence test between two variables given a conditioning set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## S4 method for signature 'ExpressionSet'
qpCItest(X, i=1, j=2, Q=c(), exact.test=TRUE, use=c("complete.obs", "em"),
                                   tol=0.01, R.code.only=FALSE)
## S4 method for signature 'cross'
qpCItest(X, i=1, j=2, Q=c(), exact.test=TRUE, use=c("complete.obs", "em"),
                           tol=0.01, R.code.only=FALSE)
## S4 method for signature 'data.frame'
qpCItest(X, i=1, j=2, Q=c(), I=NULL, long.dim.are.variables=TRUE,
                                exact.test=TRUE, use=c("complete.obs", "em"), tol=0.01, R.code.only=FALSE)
## S4 method for signature 'matrix'
qpCItest(X, i=1, j=2, Q=c(), I=NULL, long.dim.are.variables=TRUE,
                            exact.test=TRUE, use=c("complete.obs", "em"), tol=0.01, R.code.only=FALSE)
## S4 method for signature 'SsdMatrix'
qpCItest(X, i=1, j=2, Q=c(), R.code.only=FALSE)

Arguments

X

data set where the test should be performed. It can be either an ExpressionSet object, a qtl::cross object, a data frame, a matrix or an SsdMatrix-class object. In the latter case, the input matrix should correspond to a sample covariance matrix of data on which we want to test for conditional independence. The function qpCov() can be used to estimate such matrices.

i

index or name of one of the two variables in X to test.

j

index or name of the other variable in X to test.

Q

indexes or names of the variables in X forming the conditioning set.

I

indexes or names of the variables in X that are discrete. See details below regarding this argument.

long.dim.are.variables

logical; if TRUE it is assumed that when data are in a data frame or in a matrix, the longer dimension is the one defining the random variables (default); if FALSE, then random variables are assumed to be at the columns of the data frame or matrix.

exact.test

logical; if FALSE an asymptotic likelihood ratio test of conditional independence test is employed with mixed (i.e., continuous and discrete) data; if TRUE (default) then an exact likelihood ratio test of conditional independence with mixed data is employed. See details below regarding this argument.

use

a character string defining the way in which calculations are done in the presence of missing values. It can be either "complete.obs" (default) or "em".

tol

maximum tolerance controlling the convergence of the EM algorithm employed when the argument use="em".

R.code.only

logical; if FALSE then the faster C implementation is used (default); if TRUE then only R code is executed.

Details

When variables in i, j and Q are continuous and I=NULL, this function performs a conditional independence test using a t-test for zero partial regression coefficient (Lauritzen, 1996, pg. 150). Note that the size of possible Q sets should be in the range 1 to min(p,n-3), where p is the number of variables and n the number of observations. The computational cost increases linearly with the number of variables in Q.

When variables in i, j and Q are continuous and discrete (mixed data), indicated with the I argument when X is a matrix, then mixed graphical model theory (Lauritzen and Wermuth, 1989) is employed and, concretely, it is assumed that data come from an homogeneous conditional Gaussian distribution. By default, with exact.test=TRUE, an exact likelihood ratio test for conditional independence is performed (Lauritzen, 1996, pg. 192-194; Tur, Roverato and Castelo, 2014), otherwise an asymptotic one is used.

In this setting further restrictions to the maximum value of q apply, concretely, it cannot be smaller than p plus the number of levels of the discrete variables involved in the marginal distributions employed by the algorithm.

Value

A list with class "htest" containing the following components:

statistic

in case of pure continuous data and I=NULL, the t-statistic for zero partial regression coefficient; when I!=NULL, the value Lambda of the likelihood ratio if exact.test=TRUE and -n log Lambda otherwise.

parameter

in case of pure continuous data and I=NULL, the degrees of freedom for the t-statistic (n-q-2); when I!=NULL, the degrees of freedom for -n log Lambda of a chi-square distribution under the null hypothesis if exact.test=FALSE and the (a, b) parameters of a beta distribution under the null if exact.test=TRUE.

p.value

the p-value for the test.

estimate

in case of pure continuous data (I=NULL), the estimated partial regression coefficient. In case of mixed continuous and discrete data with I!=NULL, the estimated partial eta-squared: the fraction of variance from i or j explained by the other tested variable after excluding the variance explained by the variables in Q. If one of the tested variables i or j is discrete, then the partial eta-squared is calculated on the tested continuous variable. If both, i and j are continuous, then the partial eta-squared is calculated on variable i.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating what type of conditional independence test was performed.

data.name

a character string giving the name(s) of the random variables involved in the conditional independence test.

Author(s)

R. Castelo and A. Roverato

References

Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n, J. Mach. Learn. Res., 7:2621-2650, 2006.

Lauritzen, S.L. Graphical models. Oxford University Press, 1996.

Lauritzen, S.L and Wermuth, N. Graphical Models for associations between variables, some of which are qualitative and some quantitative. Ann. Stat., 17(1):31-57, 1989.

Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198:1377-1393, 2014.

See Also

qpCov qpNrr qpEdgeNrr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
require(mvtnorm)

nObs <- 100 ## number of observations to simulate

## the following adjacency matrix describes an undirected graph
## where vertex 3 is conditionally independent of 4 given 1 AND 2
A <- matrix(c(FALSE,  TRUE,  TRUE,  TRUE,
              TRUE,  FALSE,  TRUE,  TRUE,
              TRUE,   TRUE, FALSE, FALSE,
              TRUE,   TRUE, FALSE, FALSE), nrow=4, ncol=4, byrow=TRUE)
Sigma <- qpG2Sigma(A, rho=0.5)

X <- rmvnorm(nObs, sigma=as.matrix(Sigma))

qpCItest(X, i=3, j=4, Q=1, long.dim.are.variables=FALSE)

qpCItest(X, i=3, j=4, Q=c(1,2), long.dim.are.variables=FALSE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.