Description Usage Arguments Details Value References See Also Examples
View source: R/advance_hierarchy.R
Hierarchical testing for a given build-in test function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | advance_hierarchy(
x,
y,
dendr,
test = c("QF", "hdi", "hdi.logistic", "F", "logistic"),
clvar = NULL,
alpha = 0.05,
global.test = TRUE,
hier.adj = FALSE,
mt.adj = c("dpBF", "none"),
agg.method = c("Tippett", "Stouffer"),
verbose = FALSE,
sort.parallel = TRUE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L,
cl = NULL
)
|
x |
a matrix or list of matrices for multiple data sets. The matrix or matrices have to be of type numeric and are required to have column names / variable names. The rows and the columns represent the observations and the variables, respectively. |
y |
a vector, a matrix with one column, or list of the aforementioned objects for multiple data sets. The vector, vectors, matrix, or matrices have to be of type numeric. |
dendr |
the output of one of the functions
|
test |
a character string naming a 'build-in' group test function. See the 'Details' section. |
clvar |
a matrix or list of matrices of control variables. |
alpha |
the significant level at which the FWER is controlled. |
global.test |
a logical value indicating whether the global test should be performed. |
hier.adj |
a logical value indicating whether the p-values can only increase when going down some given branch. Strong FWER control holds as well if the argument is set to FALSE which is the default option. |
mt.adj |
type of multiple testing correction to be used;
either |
agg.method |
a character string naming an aggregation method which
aggregates the p-values over the different data sets for a given cluster;
either |
verbose |
a logical value indicating whether the progress of the computation should be printed in the console. |
sort.parallel |
a logical indicating whether the blocks should be sorted with respect to the size of the block. This can reduce the run time for parallel computation. |
parallel |
type of parallel computation to be used. See the 'Details' section. |
ncpus |
number of processes to be run in parallel. |
cl |
an optional parallel or snow cluster used if
|
Hierarchical testing is performed by going top down through the hierarchical
tree. Testing in some branch only continues if at least one child of a given cluster
is significant. The function advance_hierarchy
requires the output
of one of the functions cluster_vars
or
cluster_positions
as an input (argument dendr
).
The user can choose one of the 'build-in' group test functions that is applied
to every group which is tested. The default is test = "QF"
(inference for
quadratic functionals in linear regression; see function QF
in
the R package SIHR). Alternatively, there are "hdi"
(de-biased Lasso for
linear regression; see function lasso.proj
in the R package
hdi), "hdi.logistic"
(de-biased Lasso for logistic regression; see function
lasso.proj
in the R package hdi), "F"
(classical partial
F-Test for low-dimensional data; see function anova
), and
"logistic"
(likelihood ratio test for logistic regression for low-dimensional
data; see function anova
).
If one of the 'build-in' group test functions "QF"
, "hdi"
, or "hdi.logistic"
is applied and control variables are specified using the argument clvar
, then
those variables are included in the model, there is an L1-penalty in the Lasso imposed on them,
and obviously those covariates are not tested in the hierarchical procedure.
The user can specify which hierarchical multiple testing adjustment for the
hierarchical procedure is applied. The default is "dpBF"
(depth-wise
Bonferroni multiple adjustment). Alternatively, the user can choose "none"
(no adjustment). The hierarchical multiple testing adjustment "dpBF"
guarantees strong family-wise error control if the group test, which is applied
for testing a given group, controls the type I error.
If the argument block
was supplied for the building
of the hierarchical tree (i.e. in the function call of either
cluster_vars
or
cluster_positions
), i.e. the second level of the
hierarchical tree was given, the hierarchical testing step can be run in
parallel across the different blocks by specifying the arguments
parallel
and ncpus
. There is an optional argument cl
if
parallel = "snow"
. There are three possibilities to set the
argument parallel
: parallel = "no"
for serial evaluation
(default), parallel = "multicore"
for parallel evaluation
using forking, and parallel = "snow"
for parallel evaluation
using a parallel socket cluster. It is recommended to select
RNGkind("L'Ecuyer-CMRG")
and set a seed to ensure that
the parallel computing of the package hierbase
is reproducible.
This way each processor gets a different substream of the pseudo random
number generator stream which makes the results reproducible if the arguments
(as sort.parallel
and ncpus
) remain unchanged. See the vignette
or the reference for more details.
Note that if Tippett's aggregation method is applied for multiple data sets, then very small p-values are set to machine precision. This is due to rounding in floating point arithmetic.
The returned value is an object of class "hierBase"
, consisting
a data.frame with the result of the hierarchical testing.
The data.frame has the following columns:
block |
|
p.value |
The p-value of the significant cluster. |
significant.cluster |
The column names of the members of the significant cluster. |
There is a print
method for this class; see
print.hierBase
.
Meinshausen, N. (2008). Hierarchical testing of variable importance. Biometrika, 95(2), 265-278. Renaux, C., Buzdugan, L., Kalisch, M., and Bühlmann, P. (2020). Hierarchical inference for genome-wide association studies: a view on methodology with software. Computational Statistics, 35(1), 1-40.
cluster_vars
, cluster_positions
,
and run_hierarchy
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | ## Low-dimensonal example
n <- 100
p <- 50
library(MASS)
set.seed(3)
x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
colnames(x) <- paste0("Var", 1:p)
beta <- rep(0, p)
beta[c(5, 20, 46)] <- 1
y <- x %*% beta + rnorm(n)
dendr1 <- cluster_vars(x = x)
set.seed(76)
sign.clusters1 <- advance_hierarchy(x = x, y = y, dendr = dendr1,
test = "F")
## High-dimensional example
if (FALSE) {
n <- 50
p <- 80
library(MASS)
set.seed(3)
x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
colnames(x) <- paste0("Var", 1:p)
beta <- rep(0, p)
beta[c(5, 20, 46)] <- 1
y <- x %*% beta + rnorm(n)
dendr1 <- cluster_vars(x = x)
set.seed(76)
sign.clusters1 <- advance_hierarchy(x = x, y = y, dendr = dendr1,
test = "QF")
## With block
# I.e. second level of the hierarchical tree is specified by
# the user. This would allow to run the code in parallel; see the 'Details'
# section.
# The column names of the data frame block are optional.
block <- data.frame("var.name" = paste0("Var", 1:p),
"block" = rep(c(1, 2), each = p/2))
dendr2 <- cluster_vars(x = x, block = block)
set.seed(76)
sign.clusters2 <- advance_hierarchy(x = x, y = y, dendr = dendr2,
test = "QF")
# Access part of the return object or result
sign.clusters2[, "block"]
sign.clusters2[, "p.value"]
# Column names or variable names of the significant cluster in the first row.
sign.clusters2[[1, "significant.cluster"]]
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.