qpAllCItests: Tests of conditional independence
In rcastelo/qpgraph: Estimation of Genetic and Molecular Regulatory Networks from High-Throughput Genomics Data

qpAllCItests

R Documentation

Tests of conditional independence

Description

Performs a test of conditional independence for every pair of variables.

Usage

## S4 method for signature 'matrix'
qpAllCItests(X, I=NULL, Q=NULL, pairup.i=NULL, pairup.j=NULL,
                                long.dim.are.variables=TRUE, exact.test=TRUE,
                                use=c("complete.obs", "em"), tol=0.01,
                                return.type=c("p.value", "statn", "all"), verbose=TRUE,
                                R.code.only=FALSE, clusterSize=1, estimateTime=FALSE,
                                nAdj2estimateTime=10)

Arguments

`X`	data set from where to estimate the non-rejection rates. It can be an ExpressionSet object, a data frame or a matrix.
`I`	indexes or names of the variables in `X` that are discrete. See details below regarding this argument.
`Q`	indexes or names of the variables in `X` forming the conditioning set.
`pairup.i`	subset of vertices to pair up with subset `pairup.j`
`pairup.j`	subset of vertices to pair up with subset `pairup.i`
`long.dim.are.variables`	logical; if `TRUE` it is assumed that when data are in a data frame or in a matrix, the longer dimension is the one defining the random variables (default); if `FALSE`, then random variables are assumed to be at the columns of the data frame or matrix.
`exact.test`	logical; if `FALSE` an asymptotic conditional independence test is employed with mixed (i.e., continuous and discrete) data; if `TRUE` (default) then an exact conditional independence test with mixed data is employed. See details below regarding this argument.
`use`	a character string defining the way in which calculations are done in the presence of missing values. It can be either `"complete.obs"` (default) or `"em"`.
`tol`	maximum tolerance controlling the convergence of the EM algorithm employed when the argument `use="em"`.
`return.type`	type of value returned by this function. By default `"p.value"` indicates that a list containing a matrix of p-values from all performed conditional independence (CI) tests will be returned. If `return.type="statn"` then a list containing the matrix of the statistics and the sample sizes on each CI test, will be returned. If `return.type="all"` then all previous matrices of values will be returned within a list.
`verbose`	show progress on the calculations.
`R.code.only`	logical; if `FALSE` then the faster C implementation is used (default); if `TRUE` then only R code is executed.
`clusterSize`	size of the cluster of processors to employ if we wish to speed-up the calculations by performing them in parallel. A value of 1 (default) implies a single-processor execution. The use of a cluster of processors requires having previously loaded the packages `snow` and `rlecuyer`.
`estimateTime`	logical; if `TRUE` then the time for carrying out the calculations with the given parameters is estimated by calculating for a limited number of adjacencies, specified by `nAdj2estimateTime`, and extrapolating the elapsed time; if `FALSE` (default) calculations are performed normally till they finish.
`nAdj2estimateTime`	number of adjacencies to employ when estimating the time of calculations (`estimateTime=TRUE`). By default this has a default value of 10 adjacencies and larger values should provide more accurate estimates. This might be relevant when using a cluster facility.

Details

When I is set different to NULL then mixed graphical model theory is employed and, concretely, it is assumed that the data comes from an homogeneous conditional Gaussian distribution. By default, with exact.test=TRUE, an exact test for conditional independence is employed, otherwise an asymptotic one will be used. Full details on these features can be found in Tur, Roverato and Castelo (2014).

Value

A list with three entries called p.value, statistic and n corresponding to a dspMatrix-class symmetric matrix of p-values for the null hypothesis of coindtional independence with the diagonal set to NA values, an analogous matrix of the statistics of each test and of the sample sizes, respectively. These returned values, however, depend on the setting of argument return.type which, by default, enables only returning the matrix of p-values. If arguments pairup.i and pairup.j are employed, those cells outside the constrained pairs will get also a NA value.

Note, however, that when estimateTime=TRUE, then instead of the matrix of estimated non-rejection rates, a vector specifying the estimated number of days, hours, minutes and seconds for completion of the calculations is returned.

Author(s)

R. Castelo, A. Roverato and I. Tur

References

Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n, J. Mach. Learn. Res., 7:2621-2650, 2006.

Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198:1377-1393, 2014.

Examples

library(mvtnorm)

nVar <- 50  ## number of variables
maxCon <- 3 ## maximum connectivity per variable
nObs <- 30  ## number of observations to simulate

set.seed(123)

A <- qpRndGraph(p=nVar, d=maxCon)
Sigma <- qpG2Sigma(A, rho=0.5)
X <- rmvnorm(nObs, sigma=as.matrix(Sigma))

alltests <- qpAllCItests(X, verbose=FALSE)

## distribution of p-values for the present edges
summary(alltests$p.value[upper.tri(alltests$p.value) & A])

## distribution of p-values for the missing edges
summary(alltests$p.value[upper.tri(alltests$p.value) & !A])

rcastelo/qpgraph documentation built on June 14, 2025, 6:39 p.m.