Nothing
#' Hierarchical Testing
#'
#' Hierarchical testing for a given build-in test function.
#'
#' @param x a matrix or list of matrices for multiple data sets. The matrix or
#' matrices have to be of type numeric and are required to have column names
#' / variable names. The rows and the columns represent the observations and
#' the variables, respectively.
#' @param y a vector, a matrix with one column, or list of the aforementioned
#' objects for multiple data sets. The vector, vectors, matrix, or matrices
#' have to be of type numeric.
#' @param dendr the output of one of the functions
#' \code{\link{cluster_vars}} or \code{\link{cluster_positions}}.
#' @param test a character string naming a 'build-in' group test function.
#' See the 'Details' section.
#' @param clvar a matrix or list of matrices of control variables.
#' @param alpha the significant level at which the FWER is controlled.
#' @param global.test a logical value indicating whether the global test should
#' be performed.
#' @param hier.adj a logical value indicating whether the p-values can only
#' increase when going down some given branch. Strong FWER control holds as
#' well if the argument is set to FALSE which is the default option.
#' @param mt.adj type of multiple testing correction to be used;
#' either \code{"dpBF"} (depth-wise Bonferroni multiple adjustment) or \code{"none"}
#' (no adjustment). See the 'Details' section.
#' @param agg.method a character string naming an aggregation method which
#' aggregates the p-values over the different data sets for a given cluster;
#' either \code{"Tippett"} (Tippett's rule) or \code{"Stouffer"}
#' (Stouffer's rule). This argument is only relevant if multiple data sets
#' are specified in the function call.
#' @param verbose a logical value indicating whether the progress of the computation
#' should be printed in the console.
#' @param sort.parallel a logical indicating whether the blocks should be sorted with respect to
#' the size of the block. This can reduce the run time for parallel computation.
#' @param parallel type of parallel computation to be used. See the 'Details' section.
#' @param ncpus number of processes to be run in parallel.
#' @param cl an optional \strong{parallel} or \strong{snow} cluster used if
#' \code{parallel = "snow"}. If not supplied, a cluster on the local machine is created.
#'
#' @details Hierarchical testing is performed by going top down through the hierarchical
#' tree. Testing in some branch only continues if at least one child of a given cluster
#' is significant. The function \code{\link{advance_hierarchy}} requires the output
#' of one of the functions \code{\link{cluster_vars}} or
#' \code{\link{cluster_positions}} as an input (argument \code{dendr}).
#'
#' The user can choose one of the 'build-in' group test functions that is applied
#' to every group which is tested. The default is \code{test = "QF"} (inference for
#' quadratic functionals in linear regression; see function \code{\link[SIHR]{QF}} in
#' the R package \pkg{SIHR}). Alternatively, there are \code{"hdi"} (de-biased Lasso for
#' linear regression; see function \code{\link[hdi]{lasso.proj}} in the R package
#' \pkg{hdi}), \code{"hdi.logistic"} (de-biased Lasso for logistic regression; see function
#' \code{\link[hdi]{lasso.proj}} in the R package \pkg{hdi}), \code{"F"} (classical partial
#' F-Test for low-dimensional data; see function \code{\link[stats]{anova}}), and
#' \code{"logistic"} (likelihood ratio test for logistic regression for low-dimensional
#' data; see function \code{\link[stats]{anova}}).
#'
#' If one of the 'build-in' group test functions \code{"QF"}, \code{"hdi"}, or \code{"hdi.logistic"}
#' is applied and control variables are specified using the argument \code{clvar}, then
#' those variables are included in the model, there is an L1-penalty in the Lasso imposed on them,
#' and obviously those covariates are not tested in the hierarchical procedure.
#'
#' The user can specify which hierarchical multiple testing adjustment for the
#' hierarchical procedure is applied. The default is \code{"dpBF"} (depth-wise
#' Bonferroni multiple adjustment). Alternatively, the user can choose \code{"none"}
#' (no adjustment). The hierarchical multiple testing adjustment \code{"dpBF"}
#' guarantees strong family-wise error control if the group test, which is applied
#' for testing a given group, controls the type I error.
#'
#' If the argument \code{block} was supplied for the building
#' of the hierarchical tree (i.e. in the function call of either
#' \code{\link{cluster_vars}} or
#' \code{\link{cluster_positions}}), i.e. the second level of the
#' hierarchical tree was given, the hierarchical testing step can be run in
#' parallel across the different blocks by specifying the arguments
#' \code{parallel} and \code{ncpus}. There is an optional argument \code{cl} if
#' \code{parallel = "snow"}. There are three possibilities to set the
#' argument \code{parallel}: \code{parallel = "no"} for serial evaluation
#' (default), \code{parallel = "multicore"} for parallel evaluation
#' using forking, and \code{parallel = "snow"} for parallel evaluation
#' using a parallel socket cluster. It is recommended to select
#' \code{\link{RNGkind}("L'Ecuyer-CMRG")} and set a seed to ensure that
#' the parallel computing of the package \code{hierbase} is reproducible.
#' This way each processor gets a different substream of the pseudo random
#' number generator stream which makes the results reproducible if the arguments
#' (as \code{sort.parallel} and \code{ncpus}) remain unchanged. See the vignette
#' or the reference for more details.
#'
#' Note that if Tippett's aggregation method is applied for multiple data
#' sets, then very small p-values are set to machine precision. This is
#' due to rounding in floating point arithmetic.
#'
#' @return The returned value is an object of class \code{"hierBase"}, consisting
#' a data.frame with the result of the hierarchical testing.
#'
#' The data.frame has the following columns:
#' \item{block}{\code{NA} or the name of the block if the significant cluster
#' is a subcluster of the block or is the block itself.}
#' \item{p.value}{The p-value of the significant cluster.}
#' \item{significant.cluster}{The column names of the members of the significant
#' cluster.}
#'
#' There is a \code{print} method for this class; see
#' \code{\link{print.hierBase}}.
#'
#' @seealso \code{\link{cluster_vars}}, \code{\link{cluster_positions}},
#' and \code{\link{run_hierarchy}}.
#'
#' @examples
#' ## Low-dimensonal example
#' n <- 100
#' p <- 50
#' library(MASS)
#' set.seed(3)
#' x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
#' colnames(x) <- paste0("Var", 1:p)
#' beta <- rep(0, p)
#' beta[c(5, 20, 46)] <- 1
#' y <- x %*% beta + rnorm(n)
#'
#' dendr1 <- cluster_vars(x = x)
#' set.seed(76)
#' sign.clusters1 <- advance_hierarchy(x = x, y = y, dendr = dendr1,
#' test = "F")
#'
#' ## High-dimensional example
#' if (FALSE) {
#' n <- 50
#' p <- 80
#' library(MASS)
#' set.seed(3)
#' x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
#' colnames(x) <- paste0("Var", 1:p)
#' beta <- rep(0, p)
#' beta[c(5, 20, 46)] <- 1
#' y <- x %*% beta + rnorm(n)
#'
#' dendr1 <- cluster_vars(x = x)
#' set.seed(76)
#' sign.clusters1 <- advance_hierarchy(x = x, y = y, dendr = dendr1,
#' test = "QF")
#'
#' ## With block
#' # I.e. second level of the hierarchical tree is specified by
#' # the user. This would allow to run the code in parallel; see the 'Details'
#' # section.
#' # The column names of the data frame block are optional.
#'
#' block <- data.frame("var.name" = paste0("Var", 1:p),
#' "block" = rep(c(1, 2), each = p/2))
#' dendr2 <- cluster_vars(x = x, block = block)
#' set.seed(76)
#' sign.clusters2 <- advance_hierarchy(x = x, y = y, dendr = dendr2,
#' test = "QF")
#'
#' # Access part of the return object or result
#' sign.clusters2[, "block"]
#' sign.clusters2[, "p.value"]
#' # Column names or variable names of the significant cluster in the first row.
#' sign.clusters2[[1, "significant.cluster"]]
#' }
#'
#' @references
#' Meinshausen, N. (2008). Hierarchical testing of variable importance.
#' Biometrika, 95(2), 265-278.
#' Renaux, C., Buzdugan, L., Kalisch, M., and Bühlmann, P. (2020). Hierarchical inference for
#' genome-wide association studies: a view on methodology with software.
#' Computational Statistics, 35(1), 1-40.
#'
#' @name advance_hierarchy
#' @export
advance_hierarchy <- function(x, y, dendr,
test = c("QF", "hdi", "hdi.logistic", "F", "logistic"),
clvar = NULL,
alpha = 0.05, global.test = TRUE,
hier.adj = FALSE,
mt.adj = c("dpBF", "none"), # "SBH"
agg.method = c("Tippett", "Stouffer"),
verbose = FALSE, sort.parallel = TRUE,
parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL) {
test <- match.arg(test)
mt.adj <- match.arg(mt.adj)
agg.method <- match.arg(agg.method)
parallel <- match.arg(parallel)
# Check dimensions if low-dimensional test is applied
# For simplicity, we only check x and omit checking clvar.
if (test %in% c("F", "logistic")) {
if(is.matrix(x)){ # if x is a matrix
if (nrow(x) <= ncol(x)) {
stop("You chose a test function for low-dimensional data but you have more variables than observations in your data.")
}
}
if (is.list(x)) { # if x is a list of matrices
len_x <- length(x)
for (i in seq_len(len_x)) {
if (nrow(x[[i]]) <= ncol(x[[i]])) {
stop("You chose a test function for low-dimensional data but you have more variables than observations in your data.")
}
}
}
}
# The test functions can be found in the R script test-functions.R
if (test == "F") {
test.func.tmp <- test.func.F
arg.all.tmp <- list(NULL)
arg.all.fix.tmp <- NULL
compMOD.same.tmp <- compMOD.same.F
compMOD.changing.tmp <- compMOD.NULL
} else if (test == "logistic") {
test.func.tmp <- test.func.LRT
arg.all.tmp <- list(NULL)
arg.all.fix.tmp <- NULL
compMOD.same.tmp <- compMOD.same.LRT
compMOD.changing.tmp <- compMOD.NULL
} else if (test == "hdi") {
test.func.tmp <- test.func.hdi
arg.all.tmp <- list(NULL)
arg.all.fix.tmp <- list(standardize = TRUE, # default
family = "gaussian")
compMOD.same.tmp <- compMOD.same.hdi
compMOD.changing.tmp <- compMOD.NULL
} else if (test == "hdi.logistic") {
test.func.tmp <- test.func.hdi
arg.all.tmp <- list(NULL)
arg.all.fix.tmp <- list(standardize = TRUE, # default
family = "binomial")
compMOD.same.tmp <- compMOD.same.hdi
compMOD.changing.tmp <- compMOD.NULL
}
else if (test == "QF") {
p <- if (is.null(clvar)) {ncol(x)} else {ncol(x) + ncol(clvar)}
test.func.tmp <- test.func.QF
arg.all.tmp <- list(NULL)
arg.all.fix.tmp <- list(Cov.weight = TRUE, # default
A = NULL,
# previous version of SIHR: diag(1, p),
# previous version of SIHR: intercept = TRUE,
tau.vec = 1,
lambda = NULL, # use glmnet CV with lambda.min
# lambda = "CV", # use glmnet CV with lambda.1se
mu = NULL, step = NULL,
resol = 1.5, maxiter = 6)
compMOD.same.tmp <- compMOD.same.QF
compMOD.changing.tmp <- compMOD.NULL
}
run_hierarchy(x = x, y = y, dendr = dendr,
test.func = test.func.tmp, # see above
clvar = clvar,
arg.all = arg.all.tmp, # see above
arg.all.fix = arg.all.fix.tmp, # see above
compMOD.same = compMOD.same.tmp, # see above
compMOD.changing = compMOD.changing.tmp, # see above
# block argument....
alpha = alpha, global.test = global.test,
hier.adj = hier.adj,
mt.adj = mt.adj,
agg.method = agg.method,
verbose = verbose, sort.parallel = sort.parallel,
parallel = parallel,
ncpus = ncpus, cl = cl)
} # {advance_hierarchy}
## TMP Parts of the help file which have been changed.
#
# @param mt.adj type of multiple testing correction to be used;
# either \code{"SBH"} (Sparse Branch Hierarchical multiple adjustment),
# \code{"dpBF"} (depth-wise Bonferroni multiple adjustment), or \code{"none"}
# (no adjusment). See the 'Details' section.
#
# The user can specify which hierarchical multiple testing adjustment for the
# hierarchical procedure is applied. The default is \code{mt.adj = "SBH"}
# (Sparse Branch Hierarchical multiple adjustment), which is an improvement
# over so-called depth-wise Bonferroni with respect to power.
# Alternatively, the user can choose \code{"dpBF"} (depth-wise Bonferroni multiple
# adjustment) or \code{"none"} (no adjustment). The hierarchical multiple testing
# adjustments \code{"SBH"} and \code{"dpBF"} guarantee strong family-wise error
# control if the group test, which is applied for testing a given group,
# controls the type I error.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.