View source: R/group_features.R
groupByCorrelation allows to group rows in a numeric matrix based on
their correlation with each other.
Two types of groupings are available:
inclusive = FALSE (the default): the algorithm creates small groups of
highly correlated members, all of which have a correlation with each other
>= threshold. Note that with this algorithm, rows in
still have a correlation
>= threshold with one or more elements of a
group they are not part of. See notes below for more information.
inclusive = TRUE: the algorithm creates large groups containing rows that
have a correlation
>= threshold with at least one element of that group.
For example, if row 1 and 3 have a correlation above the threshold and
rows 3 and 5 too (but correlation between 1 and 5 is below the threshold)
all 3 are grouped into the same group (i.e. rows 1, 3 and 5).
Note that with parameter
f it is also possible to pre-define groups of
rows that should be further sub-grouped based on correlation with each other.
In other words, if
f is provided, correlations are calculated only between
rows with the same value in
f and hence these pre-defined groups of rows
are further sub-grouped based on pairwise correlation. The returned
f with the additional subgroup appended (and separated with a
"."). See examples below.
groupByCorrelation( x, method = "pearson", use = "pairwise.complete.obs", threshold = 0.9, f = NULL, inclusive = FALSE )
optional vector of length equal to
factor with same length than
nrow(x) with the group each row
is assigned to.
Implementation note of the grouping algorithm:
all correlations between rows in
x which are
>= threshold are
identified and sorted decreasingly.
starting with the pair with the highest correlation groups are defined:
if none of the two is in a group, both are put into the same new group.
if one of the two is already in a group, the other is put into the same
group if all correlations of it to that group are
(and are not
if both are already in the same group nothing is done.
if both are in different groups: an element is put into the group of the
other if a) all correlations of it to members of the other's group
>= threshold and b) the average correlation to the
other group is larger than the average correlation to its own group.
This ensures that groups are defined in which all elements have a correlation
>= threshold with each other and the correlation between members of the
same group is maximized.
Other grouping operations:
x <- rbind( c(1, 3, 2, 5), c(2, 6, 4, 7), c(1, 1, 3, 1), c(1, 3, 3, 6), c(0, 4, 3, 1), c(1, 4, 2, 6), c(2, 8, 2, 12)) ## define which rows have a high correlation with each other groupByCorrelation(x) ## assuming we have some prior grouping of rows, further sub-group them ## based on pairwise correlation. f <- c(1, 2, 2, 1, 1, 2, 2) groupByCorrelation(x, f = f)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.