opt_grouped_hierarchical: Find optimal weights for Gower's metric and return the...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

This function is a wrapper function for find_grouped_weights that returns both the hierarchical clustering and the result from find_grouped_weights.

Usage

1
2
3
4
5
opt_grouped_hierarchical(data, combined_indices = as.list(1:ncol(data)),
            start_values = rep(1 / ncol(data), length(combined_indices) - 1),
            n_iterate = 10, clust_method = "average",
            bounds = c(1 / (3 * ncol(data)),1 - (ncol(data) - 1) / (3 * ncol(data))),
            minimal_memory_mode = F)

Arguments

The arguments are the same as those of find_grouped_weights: see that for details.

data

the data that needs to be clustered, provided as a dataframe or (numeric) matrix. It is assumed that rows correspond to instances and columns correspond to features.

combined_indices

the indices that are combined into groups. Present this as a list. Each list entry should contain, in vector format, the indices of the variables that are contained in that group

start_values

a vector containing the initial values of the weights. Defaults to 1/ncol(data) for all variables. They must not be negative. Furthermore, 1 - sum(start_values) >= bounds[1] must hold. Due to the way the algorithm is programmed, you only have to supply values for the first ncol(data) - 1 variables!

n_iterate

the maximum number of iterations used by the quasi-newton method optim. Defaults to 10.

clust_method

a string containing the type of linkage function used by hclust. Defaults to average linkage

bounds

a vector of size 2 containing the lower and upper bound in position 1 and 2 respectively. The lower bound must not be lower than 0 and not higher than 1/ncol(data). The upper bound will be set to the minimum of its current value and 1-(ncol(data)-1)*bounds[1]. For more information, see details.

minimal_memory_mode

logical that determines whether the algorithm calculates the differences for each instance and each variables beforehand or calculates them live each time. The first will be chosen when this variable is FALSE, the second one will be chosen when this variable is TRUE. Note that this requires k vectors of size n(n-1)/2 to be stored, where k is the number of columns and n the number of rows. If you have the required memory, setting this to FALSE is definitely worth the speed increase, which seems to be a factor of something between 2 and 3.

Details

See find_grouped_weights for more details.

Value

This function returns a list of two named lists:

clustering

the hierarchical clustering. See hclust for more information.

opt_result

the results of the optimisation algorithm. See find_grouped_weights for more information.

Note

This package requires the cluster package.

Author(s)

Jeroen van den Hoven

References

Clustering with optimised weights for Gower's metric: Using hierarchical clustering and Quasi-Newton methods to maximise the cophenetic correlation coefficient, Jeroen van den Hoven.

See Also

find_grouped_weights

Examples

1
2
3
4
5
6
7
## Basic example
data(mtcars)
opt_grouped_hierarchical(mtcars)

## Using custom bounds
opt_grouped_hierarchical(mtcars, combined_indices = (c(list(c(1,2)),as.list(3:11))),
        bounds = c(0, 1))

Jeroentjeh/opthierarch documentation built on May 26, 2019, 7:28 a.m.