prabclust: Clustering for biotic elements or for species delimitation...
In prabclus: Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data

prabclust

R Documentation

Clustering for biotic elements or for species delimitation (mixture method)

Description

Clusters a presence-absence matrix object (for clustering ranges/finding biotic elements, Hennig and Hausdorf, 2004) or an object of genetic information (for species delimitation, Hausdorf and Hennig, 2010) by calculating an MDS from the distances, and applying maximum likelihood Gaussian mixtures clustering with "noise" (package mclust) to the MDS points. The solution is plotted. A standard execution (using the default distance of prabinit) will be
prabmatrix <- prabinit(file="path/prabmatrixfile", neighborhood="path/neighborhoodfile")
clust <- prabclust(prabmatrix)
print(clust)
Examples for species delimitation are given below in the examples section. Note: Data formats are described on the prabinit and alleleinit help pages. You may also consider the example datasets kykladspecreg.dat, nb.dat, Heterotrigona_indoFO.txt or MartinezOrtega04AFLP.dat.
Note: prabclust calls the function mclustBIC in package mclust. An alternative is the use of hprabclust.

Usage

prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk =
ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0)

## S3 method for class 'prabclust'
print(x, bic=FALSE, ...)

Arguments

`prabobj`	object of class `prab` as generated by `prabinit`. Presence-absence data to be analyzed. (This can be geographical information for range clustering Can also be an object of class `alleleobject` as generated by `alleleinit`.
`mdsmethod`	`"classical"`, `"kruskal"`, or `"sammon"`. The MDS method to transform the distances to data points. `"classical"` indicates metric MDS by function `cmdscale`, `"kruskal"` is non-metric MDS.
`mdsdim`	integer. Dimension of the MDS points. For `mdsmethod=="kruskal"`, `stressvals` can be used to see how the stress depends on `mdsdim` in order to choose `mdsdim` to get a small stress (smaller than 5%, say).
`nnk`	integer. Number of nearest neighbors to determine the initial noise estimation by `NNclean`. `nnk=0` fits the model without a noise component.
`nclus`	vector of integers. Numbers of clusters to perform the mixture estimation.
`modelid`	string. Model name for `mclustBIC` (see the corresponding help page; all models or combinations of models mentioned there are possible). `modelid="all"` compares all possible models. Additionally, `"noVVV"` is possible, which fits all methods except `"VVV"`.
`permutations`	integer. It has been found occasionally that depending on the order of observations the algorithms `isoMDS` and `mclustBIC` converge to different solutions. This is because these methods require an ordering of the distances, which, if equal distance values are involved, may depend on the order. `prabclust` uses a standard ordering which should give a reproducible solution in these cases as well. However, if `permutations>0`, which gives a number of random permutations of the observations, the algorithm is carried out for every permutation and the best solution (in terms of the BIC, based on the lowest stress MDS configuration) is given out (for many datasets this won't change anything except increasing the computing time).
`x`	object of class `prabclust`. Output of `prabclust`.
`bic`	logical. If `TRUE`, information about the BIC criterion to choose the model is displayed.
`...`	necessary for summary method.

Details

Note that if mdsmethod!="classical", zero distances between non-identical objects are replaced by the smallest nonzero distance divided by 10 to prevent the MDS methods from producing an error.

Value

print.prabclust does not produce output. prabclust generates an object of class prabclust. This is a list with components

`clustering`	vector of integers indicating the cluster memberships of the species. Noise can be recognized by output component `symbols`.
`clustsummary`	output object of `summary.mclustBIC`. A list giving the optimal (according to BIC) parameters, conditional probabilities ‘z’, and loglikelihood, together with the associated classification and its uncertainty. Note that the numbering of clusters may differ from `clustering`, see `csreorder`.
`bicsummary`	output object of `mclustBIC`. Bayesian Information Criterion for the specified mixture models and numbers of clusters.
`points`	numerical matrix. MDS configuration.
`nnk`	see above.
`mdsdim`	see above.
`mdsmethod`	see above.
`symbols`	vector of characters, similar to `clustering`, but indicating estimated noise and points belonging to one-point-components (which should be interpreted as some kind of noise as well) by `"N"`.
`permchange`	logical. If `TRUE`, `permutations>0` has been used and the best solution is different from the one obtained by the standard ordering. (This is just for information and has no further operational consequences.)

Note

Note that we used mdsmethod="kruskal" in our publications, but mdsmethod="classical" is now the default, because of occasional numerical instabilities of the isoMDS-implementation for Jaccard, Kulczynski or geco distance matrices.

Sometimes, prabclust produces an error because mclustBIC cannot handle all models properly. In this case we recommend to change the modelid parameter. "noVVV" and "VVV" are reasonable alternative choices (one of these is expected to reproduce the error, but the other one might work).

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en

References

Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.

Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.

Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.

Examples


# Biotic element/range clustering:
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
print(prabclust(x))

# Here is an example for species delimitation with codominant markers;
# only 50 individuals were used in order to have a fast example. 
data(tetragonula)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta)
print(prabclust(tai))

# Here is an example for species delimitation with dominant markers;
# only 50 individuals were used in order to have a fast example.
# You may want to use stressvals to choose mdsdim.
data(veronica)
vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard")
print(prabclust(vei,mdsmethod="kruskal",mdsdim=3))

prabclus documentation built on Sept. 24, 2024, 5:07 p.m.