prabclust: Clustering for biotic elements or for species delimitation...

prabclustR Documentation

Clustering for biotic elements or for species delimitation (mixture method)

Description

Clusters a presence-absence matrix object (for clustering ranges/finding biotic elements, Hennig and Hausdorf, 2004) or an object of genetic information (for species delimitation, Hausdorf and Hennig, 2010) by calculating an MDS from the distances, and applying maximum likelihood Gaussian mixtures clustering with "noise" (package mclust) to the MDS points. The solution is plotted. A standard execution (using the default distance of prabinit) will be
prabmatrix <- prabinit(file="path/prabmatrixfile", neighborhood="path/neighborhoodfile")
clust <- prabclust(prabmatrix)
print(clust)
Examples for species delimitation are given below in the examples section. Note: Data formats are described on the prabinit and alleleinit help pages. You may also consider the example datasets kykladspecreg.dat, nb.dat, Heterotrigona_indoFO.txt or MartinezOrtega04AFLP.dat.
Note: prabclust calls the function mclustBIC in package mclust. An alternative is the use of hprabclust.

Usage

prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk =
ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0)

## S3 method for class 'prabclust'
print(x, bic=FALSE, ...)

Arguments

prabobj

object of class prab as generated by prabinit. Presence-absence data to be analyzed. (This can be geographical information for range clustering Can also be an object of class alleleobject as generated by alleleinit.

mdsmethod

"classical", "kruskal", or "sammon". The MDS method to transform the distances to data points. "classical" indicates metric MDS by function cmdscale, "kruskal" is non-metric MDS.

mdsdim

integer. Dimension of the MDS points. For mdsmethod=="kruskal", stressvals can be used to see how the stress depends on mdsdim in order to choose mdsdim to get a small stress (smaller than 5%, say).

nnk

integer. Number of nearest neighbors to determine the initial noise estimation by NNclean. nnk=0 fits the model without a noise component.

nclus

vector of integers. Numbers of clusters to perform the mixture estimation.

modelid

string. Model name for mclustBIC (see the corresponding help page; all models or combinations of models mentioned there are possible). modelid="all" compares all possible models. Additionally, "noVVV" is possible, which fits all methods except "VVV".

permutations

integer. It has been found occasionally that depending on the order of observations the algorithms isoMDS and mclustBIC converge to different solutions. This is because these methods require an ordering of the distances, which, if equal distance values are involved, may depend on the order. prabclust uses a standard ordering which should give a reproducible solution in these cases as well. However, if permutations>0, which gives a number of random permutations of the observations, the algorithm is carried out for every permutation and the best solution (in terms of the BIC, based on the lowest stress MDS configuration) is given out (for many datasets this won't change anything except increasing the computing time).

x

object of class prabclust. Output of prabclust.

bic

logical. If TRUE, information about the BIC criterion to choose the model is displayed.

...

necessary for summary method.

Details

Note that if mdsmethod!="classical", zero distances between non-identical objects are replaced by the smallest nonzero distance divided by 10 to prevent the MDS methods from producing an error.

Value

print.prabclust does not produce output. prabclust generates an object of class prabclust. This is a list with components

clustering

vector of integers indicating the cluster memberships of the species. Noise can be recognized by output component symbols.

clustsummary

output object of summary.mclustBIC. A list giving the optimal (according to BIC) parameters, conditional probabilities ā€˜zā€™, and loglikelihood, together with the associated classification and its uncertainty. Note that the numbering of clusters may differ from clustering, see csreorder.

bicsummary

output object of mclustBIC. Bayesian Information Criterion for the specified mixture models and numbers of clusters.

points

numerical matrix. MDS configuration.

nnk

see above.

mdsdim

see above.

mdsmethod

see above.

symbols

vector of characters, similar to clustering, but indicating estimated noise and points belonging to one-point-components (which should be interpreted as some kind of noise as well) by "N".

permchange

logical. If TRUE, permutations>0 has been used and the best solution is different from the one obtained by the standard ordering. (This is just for information and has no further operational consequences.)

Note

Note that we used mdsmethod="kruskal" in our publications, but mdsmethod="classical" is now the default, because of occasional numerical instabilities of the isoMDS-implementation for Jaccard, Kulczynski or geco distance matrices.

Sometimes, prabclust produces an error because mclustBIC cannot handle all models properly. In this case we recommend to change the modelid parameter. "noVVV" and "VVV" are reasonable alternative choices (one of these is expected to reproduce the error, but the other one might work).

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en

References

Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.

Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.

Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.

See Also

mclustBIC, summary.mclustBIC, NNclean, cmdscale, isoMDS, sammon, prabinit, hprabclust, alleleinit, stressvals.

Examples


# Biotic element/range clustering:
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
print(prabclust(x))

# Here is an example for species delimitation with codominant markers;
# only 50 individuals were used in order to have a fast example. 
data(tetragonula)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta)
print(prabclust(tai))

# Here is an example for species delimitation with dominant markers;
# only 50 individuals were used in order to have a fast example.
# You may want to use stressvals to choose mdsdim.
data(veronica)
vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard")
print(prabclust(vei,mdsmethod="kruskal",mdsdim=3))


prabclus documentation built on Sept. 24, 2024, 5:07 p.m.