qpNrr | R Documentation |
Estimates non-rejection rates for every pair of variables.
## S4 method for signature 'ExpressionSet'
qpNrr(X, q=1, restrict.Q=NULL, fix.Q=NULL, nTests=100,
alpha=0.05, pairup.i=NULL, pairup.j=NULL,
verbose=TRUE, identicalQs=TRUE, exact.test=TRUE,
use=c("complete.obs", "em"), tol=0.01, R.code.only=FALSE,
clusterSize=1, estimateTime=FALSE, nAdj2estimateTime=10)
## S4 method for signature 'cross'
qpNrr(X, q=1, restrict.Q=NULL, fix.Q=NULL, nTests=100,
alpha=0.05, pairup.i=NULL, pairup.j=NULL, verbose=TRUE,
identicalQs=TRUE, exact.test=TRUE, use=c("complete.obs", "em"),
tol=0.01, R.code.only=FALSE, clusterSize=1, estimateTime=FALSE,
nAdj2estimateTime=10)
## S4 method for signature 'data.frame'
qpNrr(X, q=1, I=NULL, restrict.Q=NULL, fix.Q=NULL, nTests=100,
alpha=0.05, pairup.i=NULL, pairup.j=NULL,
long.dim.are.variables=TRUE, verbose=TRUE,
identicalQs=TRUE, exact.test=TRUE, use=c("complete.obs", "em"),
tol=0.01, R.code.only=FALSE, clusterSize=1,
estimateTime=FALSE, nAdj2estimateTime=10)
## S4 method for signature 'matrix'
qpNrr(X, q=1, I=NULL, restrict.Q=NULL, fix.Q=NULL, nTests=100,
alpha=0.05, pairup.i=NULL, pairup.j=NULL,
long.dim.are.variables=TRUE, verbose=TRUE, identicalQs=TRUE,
exact.test=TRUE, use=c("complete.obs", "em"), tol=0.01,
R.code.only=FALSE, clusterSize=1, estimateTime=FALSE,
nAdj2estimateTime=10)
X |
data set from where to estimate the non-rejection rates.
It can be an |
q |
partial-correlation order to be employed. |
I |
indexes or names of the variables in |
restrict.Q |
indexes or names of the variables in |
fix.Q |
indexes or names of the variables in |
nTests |
number of tests to perform for each pair for variables. |
alpha |
significance level of each test. |
pairup.i |
subset of vertices to pair up with subset |
pairup.j |
subset of vertices to pair up with subset |
long.dim.are.variables |
logical; if |
verbose |
show progress on the calculations. |
identicalQs |
use identical conditioning subsets for every pair of vertices
(default), otherwise sample a new collection of |
exact.test |
logical; if |
use |
a character string defining the way in which calculations are done in the
presence of missing values. It can be either |
tol |
maximum tolerance controlling the convergence of the EM algorithm employed
when the argument |
R.code.only |
logical; if |
clusterSize |
size of the cluster of processors to employ if we wish to
speed-up the calculations by performing them in parallel. A value of 1
(default) implies a single-processor execution. The use of a cluster of
processors requires having previously loaded the packages |
estimateTime |
logical; if |
nAdj2estimateTime |
number of adjacencies to employ when estimating the
time of calculations ( |
Note that for pure continuous data the possible values of q
should be in the
range 1 to min(p, n-3)
, where p
is the number of variables and
n
the number of observations. The computational cost increases linearly
with q
and quadratically in p
. When setting identicalQs
to FALSE
the computational cost may increase between 2 times and one
order of magnitude (depending on p
and q
) while asymptotically
the estimation of the non-rejection rate converges to the same value. Full
details on the calculation of the non-rejection rate can be found in
Castelo and Roverato (2006).
When I
is set different to NULL
then mixed graphical model theory
is employed and, concretely, it is assumed that the data comes from an homogeneous
conditional Gaussian distribution. In this setting further restrictions to the
maximum value of q
apply, concretely, it cannot be smaller than
p
plus the number of levels of the discrete variables involved in the
marginal distributions employed by the algorithm. By default, with
exact.test=TRUE
, an exact test for conditional independence is employed,
otherwise an asymptotic one will be used. Full details on these features can
be found in Tur, Roverato and Castelo (2014).
The argument I
specifying what variables are discrete actually applies only
when X
is a matrix object since in the other cases data types are specified
for each data columns or slot.
In the case that X
is a qtl/cross
object, the default NULL
values in arguments pairup.i
and pairup.j
actually imply pairing
all markers and phenotypes with numerical phenotypes only (including integer phenotypes).
Likewise, the default argument restrict.Q=NULL
implies setting restrict.Q
to all numeric phenotypes. Setting these arguments to values other than NULL
allows the user to use those particular values being set.
A dspMatrix-class
symmetric matrix of estimated non-rejection
rates with the diagonal set to NA
values. If arguments pairup.i
and pairup.j
are employed, those cells outside the constrained pairs
will get also a NA
value.
Note, however, that when estimateTime=TRUE
, then instead of the matrix
of estimated non-rejection rates, a vector specifying the estimated number of
days, hours, minutes and seconds for completion of the calculations is returned.
R. Castelo, A. Roverato and I. Tur
Castelo, R. and Roverato, A. A robust procedure for Gaussian graphical model search from microarray data with p larger than n, J. Mach. Learn. Res., 7:2621-2650, 2006.
Tur, I., Roverato, A. and Castelo, R. Mapping eQTL networks with mixed graphical Markov models. Genetics, 198(4):1377-1393, 2014.
qpAvgNrr
qpEdgeNrr
qpHist
qpGraphDensity
qpClique
nVar <- 50 ## number of variables
maxCon <- 3 ## maximum connectivity per variable
nObs <- 30 ## number of observations to simulate
set.seed(123)
## simulate an undirected Gaussian graphical model
## determined by some random undirected d-regular graph
model <- rUGgmm(dRegularGraphParam(p=nVar, d=maxCon), rho=0.5)
model
## simulate data from this model
X <- rmvnorm(nObs, model)
dim(X)
## estimate non-rejection rates with q=3
nrr.estimates <- qpNrr(X, q=3, verbose=FALSE)
## create an adjacency matrix of the undirected graph
## determining the undirected Gaussian graphical model
A <- as(model$g, "matrix") == 1
## distribution of non-rejection rates for the present edges
summary(nrr.estimates[upper.tri(nrr.estimates) & A])
## distribution of non-rejection rates for the missing edges
summary(nrr.estimates[upper.tri(nrr.estimates) & !A])
## Not run:
## using R code only this would take much more time
qpNrr(X, q=3, R.code.only=TRUE, estimateTime=TRUE)
## only for moderate and large numbers of variables the
## use of a cluster of processors speeds up the calculations
library(snow)
library(rlecuyer)
nVar <- 500
maxCon <- 3
model <- rUGgmm(dRegularGraphParam(p=nVar, d=maxCon), rho=0.5)
X <- rmvnorm(nObs, model)
system.time(nrr.estimates <- qpNrr(X, q=10, verbose=TRUE))
system.time(nrr.estimates <- qpNrr(X, q=10, verbose=TRUE, clusterSize=4))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.