groupByCorrelation: Group rows in a matrix based on their correlation
In EuracBiomedicalResearch/CompMetaboTools: Utility Functions from the Eurac Research Computational Metabolomics Team

groupByCorrelation

R Documentation

Group rows in a matrix based on their correlation

Description

The groupByCorrelation allows to group rows in a numeric matrix based on their correlation with each other.

Two types of groupings are available:

inclusive = FALSE (the default): the algorithm creates small groups of highly correlated members, all of which have a correlation with each other that are ⁠>= threshold⁠. Note that with this algorithm, rows in x could still have a correlation ⁠>= threshold⁠ with one or more elements of a group they are not part of. See notes below for more information.
inclusive = TRUE: the algorithm creates large groups containing rows that have a correlation ⁠>= threshold⁠ with at least one element of that group. For example, if row 1 and 3 have a correlation above the threshold and rows 3 and 5 too (but correlation between 1 and 5 is below the threshold) all 3 are grouped into the same group (i.e. rows 1, 3 and 5).

Note that with parameter f it is also possible to pre-define groups of rows that should be further sub-grouped based on correlation with each other. In other words, if f is provided, correlations are calculated only between rows with the same value in f and hence these pre-defined groups of rows are further sub-grouped based on pairwise correlation. The returned factor is then f with the additional subgroup appended (and separated with a "."). See examples below.

Usage

groupByCorrelation(
  x,
  method = "pearson",
  use = "pairwise.complete.obs",
  threshold = 0.9,
  f = NULL,
  inclusive = FALSE
)

Arguments

`x`	`numeric` `matrix` where rows should be grouped based on correlation of their values across columns being larger than `threshold`.
`method`	`character(1)` with the method to be used for correlation. See `corr()` for options.
`use`	`character(1)` defining which values should be used for the correlation. See `corr()` for details.
`threshold`	`numeric(1)` defining the cut of value above which rows are considered to be correlated and hence grouped.
`f`	optional vector of length equal to `nrow(x)` pre-defining groups of rows in `x` that should be further sub-grouped. See description for details.
`inclusive`	`logical(1)` whether a version of the grouping algorithm should be used that leads to larger, more loosely correlated, groups. The default is `inclusive = FALSE`. See description for more information.

Value

factor with same length than nrow(x) with the group each row is assigned to.

Note

Implementation note of the grouping algorithm:

all correlations between rows in x which are ⁠>= threshold⁠ are identified and sorted decreasingly.
starting with the pair with the highest correlation groups are defined:
if none of the two is in a group, both are put into the same new group.
if one of the two is already in a group, the other is put into the same group if all correlations of it to that group are ⁠>= threshold⁠ (and are not NA).
if both are already in the same group nothing is done.
if both are in different groups: an element is put into the group of the other if a) all correlations of it to members of the other's group are not NA and ⁠>= threshold⁠ and b) the average correlation to the other group is larger than the average correlation to its own group.

This ensures that groups are defined in which all elements have a correlation ⁠>= threshold⁠ with each other and the correlation between members of the same group is maximized.

Author(s)

Johannes Rainer

Examples


x <- rbind(
    c(1, 3, 2, 5),
    c(2, 6, 4, 7),
    c(1, 1, 3, 1),
    c(1, 3, 3, 6),
    c(0, 4, 3, 1),
    c(1, 4, 2, 6),
    c(2, 8, 2, 12))

## define which rows have a high correlation with each other
groupByCorrelation(x)

## assuming we have some prior grouping of rows, further sub-group them
## based on pairwise correlation.
f <- c(1, 2, 2, 1, 1, 2, 2)
groupByCorrelation(x, f = f)

EuracBiomedicalResearch/CompMetaboTools documentation built on Jan. 31, 2024, 1:14 p.m.

EuracBiomedicalResearch/CompMetaboTools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EuracBiomedicalResearch/CompMetaboTools
Utility Functions from the Eurac Research Computational Metabolomics Team

groupByCorrelation: Group rows in a matrix based on their correlation
In EuracBiomedicalResearch/CompMetaboTools: Utility Functions from the Eurac Research Computational Metabolomics Team

Group rows in a matrix based on their correlation

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

Related to groupByCorrelation in EuracBiomedicalResearch/CompMetaboTools...

R Package Documentation

Browse R Packages

We want your feedback!

EuracBiomedicalResearch/CompMetaboTools Utility Functions from the Eurac Research Computational Metabolomics Team

groupByCorrelation: Group rows in a matrix based on their correlation In EuracBiomedicalResearch/CompMetaboTools: Utility Functions from the Eurac Research Computational Metabolomics Team

Group rows in a matrix based on their correlation

Description

Usage

Arguments

Value

Note

Author(s)

See Also

Examples

Related to groupByCorrelation in EuracBiomedicalResearch/CompMetaboTools...

R Package Documentation

Browse R Packages

We want your feedback!

EuracBiomedicalResearch/CompMetaboTools
Utility Functions from the Eurac Research Computational Metabolomics Team

groupByCorrelation: Group rows in a matrix based on their correlation
In EuracBiomedicalResearch/CompMetaboTools: Utility Functions from the Eurac Research Computational Metabolomics Team