Home

/

GitHub

/

ornithos/nectr

/

clsTurnRes: TURN-RES Clustering

clsTurnRes: TURN-RES Clustering
In ornithos/nectr: Exploring Datasets via Clustering with TURN-RES.

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/clsTurnRes.R

Implementation of the TURN-RES clustering algorithm (Foss, 2002). TURN-RES is a density based clustering algorithm, but achieves superior efficiency and usability over other methods such as DBSCAN. Neighbour estimation is achieved through cyclically sorting the dataset over all its dimensions, but note that each datapoint is given only one neighbour in every perpendicular direction, and not necessarily the closest one.

1 2	clsTurnRes(data, r, summarise = F, min.size = "Auto", base.cls = "None", phi = 0.8)

`data`	required. A numeric data frame or matrix where each column is a dimension to be clustered over. Alternatively a cTurn object; ie a previous output of this function.
`r`	required. Resolution parameter for TURN-RES. Think of this like the adjustment wheel of a microscope. The smaller the value, the higher the granularity of the clustering. A high resolution will quantize data to a coarser grid. The purpose of the clsMRes function is inform the value of this parameter.
`summarise`	Output can be summarised if purpose of clustering was for top level metrics. When clsMRes calls `clsTurnRes`, it only requires statistics of the run rather than the full output, so a summary is returned. There are two levels of summary. (1) is the more highly summarised - simply n, k, (2) also returns the cluster vector.
`min.size`	The minimum size a cluster must be for classification as a cluster. Any clusters smaller than `min.size` will be considered as noise. The default value is `n/100`, so a cluster must represent at least 1% of the dataset. A numeric value must be supplied is interpreted absolutely rather than as a proportion.
`base.cls`	A somewhat experimental notion. A vector of the known cluster membership of the dataset, such that each row of the dataset corresponds to the respective row in the vector, where an integer will specify the cluster number, and `NA` denotes unknown cluster membership. This is intended to force the algorithm to separate or agglomerate clusters based on prior information, but the cutoffs between the forced separation can be an unnatural shape.
`phi`	Another parameter of the algorithm which determines the density required to agglomerate points. In theory, the choice of this parameter is arbitrary (see references), as it effectively 'scales' the resolution parameter. There has been no formal proof of this, hence the option to tweak it.

While not completely parameterless, the user is only required to specify r, the resolution of the clustering. However, the algorithm has much more power when paired with its parent function clsMRes, which iterates through a sequence of values to reveal the structure of the data and aid with parameter selection.

Desired clusters may be found across different values of the parameter r. While formaliseClusters is designed to take in arguments of a clsMR object across multiple resolutions, there may be instances where the analyst wants to split open a giant cluster for a given resolution. The function clsSplit can be called to partition a specified cluster(s) into k separate clusters.

In the TURN paper below, a second algorithm , TURN-CUT was proposed to automatically determine the choice of r. This algorithm is in principal similar to the 'elbow method' of determining number of clusters. This has been omitted due to concerns of over-fit and a proposal that an exploratory approach would anyway be preferred. Philosophically, there may be no "best" choice of parameter, as even a given objective may yield a number of different "best" parameters on the same dataset.

An object of class cTurn. The cluster membership vector can be found in the slot $cluster. cTurn objects have a number of generic functions available: print, summary and plot.

In order to avoid copying the dataset to each cTurn object, instead the name is saved as item $dataset.name. The data will then be retrieved in function calls via get(dataset.name, env = .GlobalEnv), which means that the user must ensure that the dataset variable name is not changed. This is obviously a suboptimal procedure, but given the package is to be used with large datasets, it is also inadvisable to make a copy for every object, particularly if dozens of different cluster calls are to be made in quick succession.

Alex Bird, alex.bird@boots.co.uk

Foss, A. (2002) A Parameterless Method for Efficiently Discover Clusters of Arbitrary Shape in Large Datasets. University of Alberta Canada.

clsMRes for determining the resolution; clsSplit for splitting a given cluster into k clusters

#Toy Example
data <- matrix(runif(200),100,2)
cls <- clsTurnRes(data, r = 0.1)

#Cluster Summary
summary(cls)

#Parallel Coordinate Plot
plot(cls)

ornithos/nectr documentation built on May 24, 2019, 3:57 p.m.

ornithos/nectr index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ornithos/nectr
Exploring Datasets via Clustering with TURN-RES.

clsTurnRes: TURN-RES Clustering
In ornithos/nectr: Exploring Datasets via Clustering with TURN-RES.

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to clsTurnRes in ornithos/nectr...

R Package Documentation

Browse R Packages

We want your feedback!

ornithos/nectr Exploring Datasets via Clustering with TURN-RES.

clsTurnRes: TURN-RES Clustering In ornithos/nectr: Exploring Datasets via Clustering with TURN-RES.

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to clsTurnRes in ornithos/nectr...

R Package Documentation

Browse R Packages

We want your feedback!

ornithos/nectr
Exploring Datasets via Clustering with TURN-RES.

clsTurnRes: TURN-RES Clustering
In ornithos/nectr: Exploring Datasets via Clustering with TURN-RES.