test_only_hierarchy: Hierarchical Testing

Description Usage Arguments Details Value References See Also Examples

View source: R/test-only-hierarchy.R

Description

Hierarchical Testing given the output of the function multisplit.

Usage

1
2
3
4
5
test_only_hierarchy(x, y, dendr, res.multisplit, clvar = NULL,
  family = c("gaussian", "binomial"), alpha = 0.05,
  global.test = TRUE, verbose = FALSE, sort.parallel = TRUE,
  parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL,
  check.input = TRUE, unique.colnames.x = NULL)

Arguments

x

a matrix or list of matrices for multiple data sets. The matrix or matrices have to be of type numeric and are required to have column names / variable names. The rows and the columns represent the observations and the variables, respectively.

y

a vector, a matrix with one column, or list of the aforementioned objects for multiple data sets. The vector, vectors, matrix, or matrices have to be of type numeric. For family = "binomial", the response is required to be a binary vector taking values 0 and 1.

dendr

the output of one of the functions cluster_var or cluster_position.

res.multisplit

the output of the function multisplit.

clvar

a matrix or list of matrices of control variables.

family

a character string naming a family of the error distribution; either "gaussian" or "binomial".

alpha

the significant level at which the FWER is controlled.

global.test

a logical value indicating whether the global test should be performed.

verbose

a logical value indicating whether the progress of the computation should be printed in the console.

sort.parallel

a logical indicating whether the values are sorted with respect to the size of the block. This can reduce the run time for parallel computation.

parallel

type of parallel computation to be used. See the 'Details' section.

ncpus

number of processes to be run in parallel.

cl

an optional parallel or snow cluster used if parallel = "snow". If not supplied, a cluster on the local machine is created.

check.input

a logical value indicating whether the function should check the input. This argument is used to call test_only_hierarchy within test_hierarchy.

unique.colnames.x

a character vector containing the unique column names of x. This argument is used to call test_only_hierarchy within test_hierarchy.

Details

The function test_only_hierarchy requires the output of one of the functions cluster_var or cluster_position as an input (argument dendr). Furthermore it requires the output of the function multisplit as an input (argument res.multisplit). Hierarchical testing is performed by going top down through the hierarchical tree. Testing only continues if at least one child of a given cluster is significant.

If the argument block was supplied for the building of the hierarchical tree (i.e. in the function call of either cluster_var or cluster_position), i.e. the second level of the hierarchical tree was given, the hierarchical testing step can be run in parallel across the different blocks by specifying the arguments parallel and ncpus. There is an optional argument cl if parallel = "snow". There are three possibilities to set the argument parallel: parallel = "no" for serial evaluation (default), parallel = "multicore" for parallel evaluation using forking, and parallel = "snow" for parallel evaluation using a parallel socket cluster. It is recommended to select RNGkind("L'Ecuyer-CMRG") and set a seed to ensure that the parallel computing of the package hierinf is reproducible. This way each processor gets a different substream of the pseudo random number generator stream which makes the results reproducible if the arguments (as sort.parallel and ncpus) remain unchanged. See the vignette or the reference for more details.

Value

The returned value is an object of class "hierT", consisting of two elements, the result of the multi-sample splitting step "res.multisplit" and the result of the hierarchical testing "res.hierarchy".

The result of the multi-sample splitting step is a list with number of elements corresponding to the number of data sets. Each element (corresponding to a data set) contains a list with two matrices. The first matrix contains the indices of the second half of variables (which were not used to select the variables). The second matrix contains the column names / variable names of the selected variables.

The result of the hierarchical testing is a data frame of significant clusters with the following columns:

block

NA or the name of the block if the significant cluster is a subcluster of the block or is the block itself.

p.value

The p-value of the significant cluster.

significant.cluster

The column names of the members of the significant cluster.

There is a print method for this class; see print.hierT.

References

Renaux, C. et al. (2018), Hierarchical inference for genome-wide association studies: a view on methodology with software. (arXiv:1805.02988)

See Also

cluster_var, cluster_position, multisplit, test_hierarchy, and compute_r2.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
n <- 200
p <- 500
library(MASS)
set.seed(3)
x <- mvrnorm(n, mu = rep(0, p), Sigma = diag(p))
colnames(x) <- paste0("Var", 1:p)
beta <- rep(0, p)
beta[c(5, 20, 46)] <- 1
y <- x %*% beta + rnorm(n)

dendr1 <- cluster_var(x = x)
set.seed(76)
res.multisplit1 <- multisplit(x = x, y = y, family = "gaussian")
sign.clusters1 <- test_only_hierarchy(x = x, y = y, dendr = dendr1,
                                      res.multisplit = res.multisplit1,
                                      family = "gaussian")

## With block
# The column names of the data frame block are optional.
block <- data.frame("var.name" = paste0("Var", 1:p),
                    "block" = rep(c(1, 2), each = p/2),
                    stringsAsFactors = FALSE)
dendr2 <- cluster_var(x = x, block = block)
# The output res.multisplit1 can be used since the multi-sample
# step is the same with or without blocks.
sign.clusters2 <- test_only_hierarchy(x = x, y = y, dendr = dendr2,
                                      res.multisplit = res.multisplit1,
                                      family = "gaussian")

# Access part of the object
sign.clusters2$res.hierarchy[, "block"]
sign.clusters2$res.hierarchy[, "p.value"]
# Column names or variable names of the significant cluster in the first row.
sign.clusters2$res.hierarchy[[1, "significant.cluster"]]

crbasel/hierinf documentation built on Nov. 5, 2018, 5:22 p.m.