compute.config.matrices: Compute configuration matrices

Description Usage Arguments Details Value Examples

Description

Given a list of n data matrices (corresponding to n datasets), this function computes the configuration matrix for each of these configuration matrices. By default inner product similarity is used, but other similarity (such as Jaccard similarity for binary data) can also be used (see the vignette 'A quick introduction to iTOP' for more information). In addition, the configuration matrices can be centered and prepared for use with the modified RV coefficient, both of which we will briefly explain here.

Usage

1
2
compute.config.matrices(data, similarity_fun = inner.product, center = TRUE,
  mod.rv = TRUE)

Arguments

data

List of datasets.

similarity_fun

Either a function pointer to the similarity function to be used for all datasets; or a list of function pointers, if different similarity functions need to be used for different datasets (default=inner.product).

center

Either a boolean indicating whether centering should be used for all datasets; or a list of booleans, if centering should be used for some datasets but not all of them (default=TRUE).

mod.rv

Either a boolean indicating whether the modified RV coefficient should be used for all datasets; or a list of booleans, if the modified RV should be used for some datasets but not all of them (default=TRUE).

Details

The RV coefficient often results in values very close to one when both datasets are not centered around zero, even for orthogonal data. For inner product similarity and Jaccard similarity, we recommend using centering. However, for some other similarity measures, centering may not be beneficial (for example, because the measure itself is already centered, such as in the case of Pearson correlation). For more information on centering of binary (and other non-continuous) data, for which we used kernel centering of the configuration matrix, we refer to our manuscript: Aben et al., 2018, doi.org/10.1101/293993.

The modified RV coefficient was proposed for high-dimensional data, as the regular RV coefficient would result in values close to one even for orthogonal data. We recommend always using the modified RV coefficient.

Value

A list of n configuration matrices, where n is the number of datasets.

Examples

1
2
3
4
5
6
7
8
9
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)

iTOP documentation built on May 2, 2019, 3:44 a.m.