isopam | R Documentation |
Isopam classification is performed either as a hierarchical, divisive method, or as non-hierarchical partitioning. Optimizes clusters and optionally cluster numbers for maximum performance of group indicators. Developed for matrices representing species abundances in plots and with a diagnostic species approach in mind, thus optimizing the concentration of indicative species in groups. Apart from from the default auto-pilot mode, predefined indicative species and cluster medoids can be added for a supervised classification.
isopam(dat, c.fix = FALSE, c.opt = TRUE, c.max = 6,
l.max = FALSE, stopat = c(1,7), sieve = TRUE,
Gs = 3.5, ind = NULL, centers = NULL, distance = 'bray',
k.max = 100, d.max = 7, juice = FALSE,
wordy = TRUE, ...)
isopamp(dat, c.fix = FALSE, c.opt = TRUE, c.max = 6,
l.max = FALSE, stopat = c(1,7), sieve = TRUE,
Gs = 3.5, ind = NULL, centers = NULL, distance = 'bray',
k.max = 100, d.max = 7, juice = FALSE,
wordy = TRUE, ...)
## S3 method for class 'isopam'
identify(x, ...)
## S3 method for class 'isopam'
plot(x, ...)
## S3 method for class 'isopam'
summary(object, ...)
dat |
data matrix: each row corresponds to an object (typically a plot), each column corresponds to a descriptor (typically a species). All variables must be numeric. Missing values (NAs) are not allowed. At least 3 rows (plots) are required. |
c.fix |
number of clusters (defaults to |
c.opt |
if |
c.max |
maximum number of clusters per partition.
Applies to all partitioning steps if |
l.max |
maximum number of hierarchy levels. Defaults
to |
stopat |
vector with stopping rules for hierarchical
clustering. Two values define if a partition should be
retained in hierarchical clustering: the first determines
how many indicators must be present per cluster, the second
defines the standardized G-value that must be reached by
these indicators. |
sieve |
logical. If |
Gs |
threshold (standardized G value) for descriptors
(species) to be considered in the search for a good
clustering solution. Effective with |
ind |
optional vector of column names from |
centers |
optional vector with observations used as cluster cores (supervised classification). |
distance |
name of a dissimilarity index for the distance matrix used as a starting point for Isomap. Any distance measure implemented in packages vegan (predefined or using a designdist equation) or proxy can be used (see details). |
k.max |
maximum Isomap k. |
d.max |
maximum number of Isomap dimensions. |
juice |
logical. If |
wordy |
logical. If |
... |
other arguments used by juice or passed to S3
functions |
x |
|
object |
|
Isopam is described in Schmidtlein et al. (2010). It consists of dimensionality reduction (Isomap: Tenenbaum et al. 2000; isomap in vegan) and partitioning of the resulting ordination space (PAM: Kaufman & Rousseeuw 1990; pam in cluster). The classification is performed either as a hierarchical, divisive method, or as non-hierarchical partitioning. Compared to other clustering methods, it has the following features: (a) it optimizes partitions for the performance of group indicators (typically species) or for maximum average 'fidelity' of descriptors to groups; (b) it optionally selects the number of clusters per division; (c) the shapes of groups in feature space are not limited to spherical or other regular geometric shapes (thanks to the underlying Isomap algorithm) and (d) the distance measure used for the initial distance matrix can be freely defined.
The parallelised version of isopam is isopamp. Depending on your device, parallelisation offers a considerable speed advantage with most large objects but no advantage or even slow-down with small objects.
plot
creates (and silently returns) an object of class
dendrogram
and calls the S3 plot method for that class.
identify
works just like identify.hclust
.
The preset distance measure is Bray-Curtis (Odum 1950).
Distance measures are passed to vegdist
or to designdist in vegan.
If this does not work it is passed to dist
in proxy. Measures available in vegan are listed in
vegdist. Measures registered in proxy
can be listed with summary(pr_DB)
. New measures can be
defined and registered as described in ?pr_DB
. Isopam does
not accept distance matrices as a replacement for the
original data matrix because it operates on individual
descriptors (species).
Progress bars are build using progressor
in progressr. Use handlers to switch them on
(handlers(global = TRUE)
) and off or to tweak them.
call |
generating call |
distance |
distance measure used by Isomap |
flat |
observations (plots) with group affiliation. Running group numbers for each level of the hierarchy. |
hier |
observations (plots) with group affiliation. Group identifiers reflect the cluster hierarchy. Not present with only one level of partitioning. |
medoids |
observations (plots) representing the medoids of the resulting groups. |
analytics |
table summarizing parameter settings for
the final partitioning steps. |
centers_usr |
Cluster centers suggested by user. |
ind_usr |
Indicators suggested by user. |
indicators |
Indicators used in each partition. |
dendro |
an object of class |
dat |
data used |
For large datasets, consider using the isopamp
function
instead of isopam
. However, the optimization procedure
(selection of Isomap dimensions and -k, optionally selection
of cluster numbers) is based on a brute force approach that takes
its time with large data sets. If used with data not representing
species in plots make sure that the indicator approach is
appropriate.
With very small datasets, the indicator based optimization may fail.
In such cases consider using filtered = FALSE
instead of
the default method.
Sebastian Schmidtlein with contributions from Jason Collison and Lubomir Tichý
Odum, E.P. (1950): Bird populations in the Highlands (North Carolina) plateau in relation to plant succession and avian invasion. Ecology 31: 587–605.
Kaufman, L., Rousseeuw, P.J. (1990): Finding groups in data. Wiley.
Schmidtlein, S., Tichý, L., Feilhauer, H., Faude, U. (2010): A brute force approach to vegetation classification. Journal of Vegetation Science 21: 1162–1171.
Tenenbaum, J.B., de Silva, V., Langford, J.C. (2000): A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323.
isotab
for a table of descriptor (species)
frequency in clusters.
## load data to the current environment
data(andechs)
## call isopam with the standard options
ip<-isopam(andechs)
## examine cluster hierarchy
plot(ip)
## examine grouping
ip$flat
## examine frequency table (second hierarchy level)
isotab(ip, 2)
## non-hierarchical partitioning
ip<-isopam(andechs,c.fix=3)
ip$flat
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.