Description Usage Arguments Details Value References See Also Examples
Build a hierarchical tree based on hierarchical clustering of the variables.
1 2 3 |
x |
a matrix or list of matrices for multiple data sets. The matrix or
matrices have to be of type numeric and are required to have column names
/ variable names. The rows and the columns represent the observations and
the variables, respectively. Either the argument |
d |
a dissimilarity matrix. This can be either a symmetric matrix of
type numeric with column and row names or an object of class
|
block |
a data frame or matrix specifying the second level of the hierarchical tree. The first column is required to contain the variable names and to be of type character. The second column is required to contain the group assignment and to be a vector of type character or numeric. If not supplied, the second level is built based on the data. |
method |
the agglomeration method to be used for the hierarchical
clustering. See |
use |
the method to be used for computing covariances in the presence
of missing values. This is important for multiple data sets which do not measure
exactly the same variables. If data is specified using the argument |
sort.parallel |
a logical indicating whether the values are sorted with respect to the size of the block. This can reduce the run time for parallel computation. |
parallel |
type of parallel computation to be used. See the 'Details' section. |
ncpus |
number of processes to be run in parallel. |
cl |
an optional parallel or snow cluster used if
|
The hierarchical tree is built by hierarchical clustering of the variables.
Either the data (using the argument x
) or a dissimilarity matrix
(using the argument d
) can be specified.
If one or multiple data sets are defined using the argument x
,
the dissimilarity matrix is calculated by one minus squared empirical
correlation. In the case of multiple data sets, a single hierarchical
tree is jointly estimated using hierarchical clustering. The argument
use
is important because missing values are introduced if the
data sets do not measure exactly the same variables. The argument
use
determines how the empirical correlation is calculated.
Alternatively, it is possible to specify a user-defined dissimilarity
matrix using the argument d
.
If the argument x
and block
are supplied, i.e. the
block
defines the second level of the
hierarchical tree, the function can be run in parallel across
the different blocks by specifying the arguments parallel
and
ncpus
. There is an optional argument cl
if
parallel = "snow"
. There are three possibilities to set the
argument parallel
: parallel = "no"
for serial evaluation
(default), parallel = "multicore"
for parallel evaluation
using forking, and parallel = "snow"
for parallel evaluation
using a parallel socket cluster. It is recommended to select
RNGkind("L'Ecuyer-CMRG")
and set a seed to ensure that
the parallel computing of the package hierinf
is reproducible.
This way each processor gets a different substream of the pseudo random
number generator stream which makes the results reproducible if the arguments
(as sort.parallel
and ncpus
) remain unchanged. See the vignette
or the reference for more details.
The returned value is an object of class "hierD"
,
consisting of two elements, the argument "block"
and the
hierarchical tree "res.tree"
.
The element "block"
defines the second level of the hierarchical
tree if supplied.
The element "res.tree"
contains a dendrogram
for each of the blocks defined in the argument block
.
If the argument block
is NULL
(i.e. not supplied),
the element contains only one dendrogram
.
Renaux, C. et al. (2018), Hierarchical inference for genome-wide association studies: a view on methodology with software. (arXiv:1805.02988)
cluster_position
and
test_hierarchy
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | library(MASS)
x <- mvrnorm(200, mu = rep(0, 500), Sigma = diag(500))
colnames(x) <- paste0("Var", 1:500)
dendr1 <- cluster_var(x = x)
# The column names of the data frame block are optional.
block <- data.frame("var.name" = paste0("Var", 1:500),
"block" = rep(c(1, 2), each = 250),
stringsAsFactors = FALSE)
dendr2 <- cluster_var(x = x, block = block)
# The matrix x is first transposed because the function dist calculates
# distances between the rows.
d <- dist(t(x))
dendr3 <- cluster_var(d = d, method = "single")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.