Tests of conditional independence

Share:

Description

Performs a test of conditional independence for every pair of variables.

Usage

1
2
3
4
5
6
7
## S4 method for signature 'matrix'
qpAllCItests(X, I=NULL, Q=NULL, pairup.i=NULL, pairup.j=NULL,
                                long.dim.are.variables=TRUE, exact.test=TRUE,
                                use=c("complete.obs", "em"), tol=0.01,
                                return.type=c("p.value", "statn", "all"), verbose=TRUE,
                                R.code.only=FALSE, clusterSize=1, estimateTime=FALSE,
                                nAdj2estimateTime=10)

Arguments

X

data set from where to estimate the non-rejection rates. It can be an ExpressionSet object, a data frame or a matrix.

I

indexes or names of the variables in X that are discrete. See details below regarding this argument.

Q

indexes or names of the variables in X forming the conditioning set.

pairup.i

subset of vertices to pair up with subset pairup.j

pairup.j

subset of vertices to pair up with subset pairup.i

long.dim.are.variables

logical; if TRUE it is assumed that when data are in a data frame or in a matrix, the longer dimension is the one defining the random variables (default); if FALSE, then random variables are assumed to be at the columns of the data frame or matrix.

exact.test

logical; if FALSE an asymptotic conditional independence test is employed with mixed (i.e., continuous and discrete) data; if TRUE (default) then an exact conditional independence test with mixed data is employed. See details below regarding this argument.

use

a character string defining the way in which calculations are done in the presence of missing values. It can be either "complete.obs" (default) or "em".

tol

maximum tolerance controlling the convergence of the EM algorithm employed when the argument use="em".

return.type

type of value returned by this function. By default "p.value" indicates that a list containing a matrix of p-values from all performed conditional independence (CI) tests will be returned. If return.type="statn" then a list containing the matrix of the statistics and the sample sizes on each CI test, will be returned. If return.type="all" then all previous matrices of values will be returned within a list.

verbose

show progress on the calculations.

R.code.only

logical; if FALSE then the faster C implementation is used (default); if TRUE then only R code is executed.

clusterSize

size of the cluster of processors to employ if we wish to speed-up the calculations by performing them in parallel. A value of 1 (default) implies a single-processor execution. The use of a cluster of processors requires having previously loaded the packages snow and rlecuyer.

estimateTime

logical; if TRUE then the time for carrying out the calculations with the given parameters is estimated by calculating for a limited number of adjacencies, specified by nAdj2estimateTime, and extrapolating the elapsed time; if FALSE (default) calculations are performed normally till they finish.

nAdj2estimateTime

number of adjacencies to employ when estimating the time of calculations (estimateTime=TRUE). By default this has a default value of 10 adjacencies and larger values should provide more accurate estimates. This might be relevant when using a cluster facility.

Details

When I is set different to NULL then mixed graphical model theory is employed and, concretely, it is assumed that the data comes from an homogeneous conditional Gaussian distribution. By default, with exact.test=TRUE, an exact test for conditional independence is employed, otherwise an asymptotic one will be used. Full details on these features can be found in Tur, Roverato and Castelo (2014).

Value

A list with three entries called p.value, statistic and n corresponding to a dspMatrix-class symmetric matrix of p-values for the null hypothesis of coindtional independence with the diagonal set to NA values, an analogous matrix of the statistics of each test and of the sample sizes, respectively. These returned values, however, depend on the setting of argument return.type which, by default, enables only returning the matrix of p-values. If arguments pairup.i and pairup.j are employed, those cells outside the constrained pairs will get also a NA value.

Note, however, that when estimateTime=TRUE, then instead of the matrix of estimated non-rejection rates, a vector specifying the estimated number of days, hours, minutes and seconds for completion of the calculations is returned.

Author(s)

R. Castelo, A. Roverato and I. Tur

References

Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n, J. Mach. Learn. Res., 7:2621-2650, 2006.

Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198:1377-1393, 2014.

See Also

qpCItest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
library(mvtnorm)

nVar <- 50  ## number of variables
maxCon <- 3 ## maximum connectivity per variable
nObs <- 30  ## number of observations to simulate

set.seed(123)

A <- qpRndGraph(p=nVar, d=maxCon)
Sigma <- qpG2Sigma(A, rho=0.5)
X <- rmvnorm(nObs, sigma=as.matrix(Sigma))

alltests <- qpAllCItests(X, verbose=FALSE)

## distribution of p-values for the present edges
summary(alltests$p.value[upper.tri(alltests$p.value) & A])

## distribution of p-values for the missing edges
summary(alltests$p.value[upper.tri(alltests$p.value) & !A])

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.