Non-rejection rate estimation

Share:

Description

Estimates non-rejection rates for every pair of variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## S4 method for signature 'ExpressionSet'
qpNrr(X, q=1, restrict.Q=NULL, fix.Q=NULL, nTests=100,
                                alpha=0.05, pairup.i=NULL, pairup.j=NULL,
                                verbose=TRUE, identicalQs=TRUE, exact.test=TRUE,
                                use=c("complete.obs", "em"), tol=0.01, R.code.only=FALSE,
                                clusterSize=1, estimateTime=FALSE, nAdj2estimateTime=10)
## S4 method for signature 'cross'
qpNrr(X, q=1, restrict.Q=NULL, fix.Q=NULL, nTests=100,
                         alpha=0.05, pairup.i=NULL, pairup.j=NULL, verbose=TRUE,
                         identicalQs=TRUE, exact.test=TRUE, use=c("complete.obs", "em"),
                         tol=0.01, R.code.only=FALSE, clusterSize=1, estimateTime=FALSE,
                         nAdj2estimateTime=10)
## S4 method for signature 'data.frame'
qpNrr(X, q=1, I=NULL, restrict.Q=NULL, fix.Q=NULL, nTests=100,
                             alpha=0.05, pairup.i=NULL, pairup.j=NULL,
                             long.dim.are.variables=TRUE, verbose=TRUE,
                             identicalQs=TRUE, exact.test=TRUE, use=c("complete.obs", "em"),
                             tol=0.01, R.code.only=FALSE, clusterSize=1,
                             estimateTime=FALSE, nAdj2estimateTime=10)
## S4 method for signature 'matrix'
qpNrr(X, q=1, I=NULL, restrict.Q=NULL, fix.Q=NULL, nTests=100,
                         alpha=0.05, pairup.i=NULL, pairup.j=NULL,
                         long.dim.are.variables=TRUE, verbose=TRUE, identicalQs=TRUE,
                         exact.test=TRUE, use=c("complete.obs", "em"), tol=0.01,
                         R.code.only=FALSE, clusterSize=1, estimateTime=FALSE,
                         nAdj2estimateTime=10)

Arguments

X

data set from where to estimate the non-rejection rates. It can be an ExpressionSet object, a qtl/cross object, a data.frame object or a matrix object.

q

partial-correlation order to be employed.

I

indexes or names of the variables in X that are discrete. See details below regarding this argument.

restrict.Q

indexes or names of the variables in X that restrict the sample space of conditioning subsets Q.

fix.Q

indexes or names of the variables in X that should be fixed within every conditioning conditioning subsets Q.

nTests

number of tests to perform for each pair for variables.

alpha

significance level of each test.

pairup.i

subset of vertices to pair up with subset pairup.j

pairup.j

subset of vertices to pair up with subset pairup.i

long.dim.are.variables

logical; if TRUE it is assumed that when data are in a data frame or in a matrix, the longer dimension is the one defining the random variables (default); if FALSE, then random variables are assumed to be at the columns of the data frame or matrix.

verbose

show progress on the calculations.

identicalQs

use identical conditioning subsets for every pair of vertices (default), otherwise sample a new collection of nTests subsets for each pair of vertices.

exact.test

logical; if FALSE an asymptotic conditional independence test is employed with mixed (i.e., continuous and discrete) data; if TRUE (default) then an exact conditional independence test with mixed data is employed. See details below regarding this argument.

use

a character string defining the way in which calculations are done in the presence of missing values. It can be either "complete.obs" (default) or "em".

tol

maximum tolerance controlling the convergence of the EM algorithm employed when the argument use="em".

R.code.only

logical; if FALSE then the faster C implementation is used (default); if TRUE then only R code is executed.

clusterSize

size of the cluster of processors to employ if we wish to speed-up the calculations by performing them in parallel. A value of 1 (default) implies a single-processor execution. The use of a cluster of processors requires having previously loaded the packages snow and rlecuyer.

estimateTime

logical; if TRUE then the time for carrying out the calculations with the given parameters is estimated by calculating for a limited number of adjacencies, specified by nAdj2estimateTime, and extrapolating the elapsed time; if FALSE (default) calculations are performed normally till they finish.

nAdj2estimateTime

number of adjacencies to employ when estimating the time of calculations (estimateTime=TRUE). By default this has a default value of 10 adjacencies and larger values should provide more accurate estimates. This might be relevant when using a cluster facility.

Details

Note that for pure continuous data the possible values of q should be in the range 1 to min(p, n-3), where p is the number of variables and n the number of observations. The computational cost increases linearly with q and quadratically in p. When setting identicalQs to FALSE the computational cost may increase between 2 times and one order of magnitude (depending on p and q) while asymptotically the estimation of the non-rejection rate converges to the same value. Full details on the calculation of the non-rejection rate can be found in Castelo and Roverato (2006).

When I is set different to NULL then mixed graphical model theory is employed and, concretely, it is assumed that the data comes from an homogeneous conditional Gaussian distribution. In this setting further restrictions to the maximum value of q apply, concretely, it cannot be smaller than p plus the number of levels of the discrete variables involved in the marginal distributions employed by the algorithm. By default, with exact.test=TRUE, an exact test for conditional independence is employed, otherwise an asymptotic one will be used. Full details on these features can be found in Tur, Roverato and Castelo (2014).

The argument I specifying what variables are discrete actually applies only when X is a matrix object since in the other cases data types are specified for each data columns or slot.

In the case that X is a qtl/cross object, the default NULL values in arguments pairup.i and pairup.j actually imply pairing all markers and phenotypes with numerical phenotypes only (including integer phenotypes). Likewise, the default argument restrict.Q=NULL implies setting restrict.Q to all numeric phenotypes. Setting these arguments to values other than NULL allows the user to use those particular values being set.

Value

A dspMatrix-class symmetric matrix of estimated non-rejection rates with the diagonal set to NA values. If arguments pairup.i and pairup.j are employed, those cells outside the constrained pairs will get also a NA value.

Note, however, that when estimateTime=TRUE, then instead of the matrix of estimated non-rejection rates, a vector specifying the estimated number of days, hours, minutes and seconds for completion of the calculations is returned.

Author(s)

R. Castelo, A. Roverato and I. Tur

References

Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n, J. Mach. Learn. Res., 7:2621-2650, 2006.

Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198(4):1377-1393, 2014.

See Also

qpAvgNrr qpEdgeNrr qpHist qpGraphDensity qpClique

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
nVar <- 50  ## number of variables
maxCon <- 3 ## maximum connectivity per variable
nObs <- 30  ## number of observations to simulate

set.seed(123)

## simulate an undirected Gaussian graphical model
## determined by some random undirected d-regular graph
model <- rUGgmm(dRegularGraphParam(p=nVar, d=maxCon), rho=0.5)
model

## simulate data from this model
X <- rmvnorm(nObs, model)
dim(X)

## estimate non-rejection rates with q=3
nrr.estimates <- qpNrr(X, q=3, verbose=FALSE)

## create an adjacency matrix of the undirected graph
## determining the undirected Gaussian graphical model
A <- as(model$g, "matrix") == 1

## distribution of non-rejection rates for the present edges
summary(nrr.estimates[upper.tri(nrr.estimates) & A])

## distribution of non-rejection rates for the missing edges
summary(nrr.estimates[upper.tri(nrr.estimates) & !A])

## Not run: 
## using R code only this would take much more time
qpNrr(X, q=3, R.code.only=TRUE, estimateTime=TRUE)

## only for moderate and large numbers of variables the
## use of a cluster of processors speeds up the calculations

library(snow)
library(rlecuyer)

nVar <- 500
maxCon <- 3
model <- rUGgmm(dRegularGraphParam(p=nVar, d=maxCon), rho=0.5)
X <- rmvnorm(nObs, model)

system.time(nrr.estimates <- qpNrr(X, q=10, verbose=TRUE))
system.time(nrr.estimates <- qpNrr(X, q=10, verbose=TRUE, clusterSize=4))

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.