twinspan | R Documentation |
Two-Way Indicator Analysis (TWINSPAN) is a divisive classification
method that works by splitting first Correspondence Analysis into
two classes, and then recursively working with each split
subset. The current function is based on and uses much of the
original FORTRAN code of the original TWINSPAN (Hill
1979). twinspan
is the main function of this package, but it
works silently and prints very little information: you must use
separate support functions to extract various aspects of the
result.
twinspan(
x,
cutlevels = c(0, 2, 5, 10, 20),
indmax = 7,
groupmin = 5,
levmax = 6,
lind,
lwgt,
noind
)
x |
Input data, usually a species community data set where columns give the species and rows the sampling units. |
cutlevels |
Cut levels used to split quantitative data into binary pseudospecies. Max of 9 cutlevels can be used. |
indmax |
Maximum number of indicators for division (15 or less). |
groupmin |
Minimum group size for division (2 or larger). |
levmax |
Maximum depth of levels of divisions (15 or less). |
lind |
Weights for levels of pseudospecies. For example
indicator potentials |
lwgt |
Weights for the levels of pseudospecies. For example
weights |
noind |
Numbers (indices) of species that you wish to omit from list of potential indicators. Species omitted from this list are used in the calculation, but cannot appear as indicators. |
twinspan
may not print anything when it runs, but it will
return its result that you should save for later use. The functions
that reproduce most of the traditional printout are
summary.twinspan
and twintable
. The
summary
prints the history of divisions with eigenvalues,
signed indicator pseudospecies and the threshold of indicator
scores for division, and for terminal groups it prints the group
size and group members (quadrats, species). Function
twintable
prints the classified community table. In
addition, plot.twinspan
shows the dendrogram corresponding
to the summary
with division numbers and sizes and id number
of terminal groups. Function image.twinspan
provides
a graphical overview of major structure of classification as a
prelude to twintable
. With function
as.dendrogram.twinspan
it is possible to construct a
dendrogram
of complete classification down to final
units (quadrats, species), and as.hclust.twinspan
constructs an hclust
tree down to final groups.
The twinspan
function performs the classic TWINSPAN with
fixed levels of hierarchy, but with other functions in this package
it is also possible to perform the modified method of Roleček et
al. (2009): the topography of the tree is the same, but the
division heights are based on the hetergeneity of the group, and
the groups are extracted in the order of their heterogeneity. The
measure of the heterogeneity is total standardized chi-square (or
inertia) of the divided group. This is based on the same matrix
as used internally in twinspan
code (see support functions
twintotalchi
, twin2stack
and
twin2specstack
).
The classification at any level of division can be extracted with
cut.twinspan
, and the most heterogenous groups with
cuth
. Function predict.twinspan
provides a similar classification vector, but based on indicator
pseudospecies, and can be used also with new data that was not used
in twinspan
. These two classifications are often in
conflict, and misclassified
will detect those cases
and the divisions where the two classifications diverged. Function
eigenvals
extracts the eigenvalues of divisions, and
twintotalchi
finds the “sum of all
eigenvalues” or the standardized chi-square of each division or
final group.
Function twinsform
transforms the data similarly as
twinspan
and can be used to reproduce the results of any
single division. Functions twin2mat
and
twin2stack
extract the internal data matrices in
standard R format from the twinspan
result.
Function returns an object of class "twinspan"
, with
following items:
Function call.
Defined cutlevels. These will be used in
predict.twinspan
.
Maximum level depth of divisions. The divisions will end when this depth is achieved.
Number of species.
Number of quadrats.
Pseudospecies data in the internal format used in
twinspan
. Functions twin2mat
and
twin2stack
can change this into more usable format.
Results for quadrats (described below).
Results for species (described below).
The results of the analysis are stored in items quadrat
and
species
with similar structure, but species
has only
items iclass
, eig
, labels
and index
of
the following:
ID numbers of final classes at the lowest level of
hierarchy. These can be extracted with cut.twinspan
which also can transform these to any higher level of hierarchy.
Eigenvalues of divisions.
Name labels of units (species, quadrats).
An index to order the units in twinspan
displays, e.g., in twintable
.
A matrix of dimensions maximum number of
indicators \times
maximum number of divisions giving the
signed indices of indicator species for the division. These are
shown in labels in summary.twinspan
and used by
predict.twinspan
.
Lowest value of indicator score for positive group in a division.
Labels of pseudospecies.
Index from pseudospecies to the corresponding species.
TWINSPAN is very complicated and has several obscure details, and it will not be explained in details in this manual, but you should consult the source code or literature sources. Hill (1979) is the most authorative source, but may be difficult to find. Kent & Coker (1991) do a great job in explaining the method, including many obscure details.
A strong simplification (but often sufficient to understand the basic principles) is that TWINSPAN is a divisive clustering based on splitting first correspondence analysis axis, and applying the same method recursively for resulting classes. The same method is used first for quadrats and then for species. In addition, it finds the species abundance levels (called ‘pseudospecies’) that best indicate these divisions facilitating ecologist's understanding of classes. Species classification is performed so that it best corresponds to the previous quadrat classification also with species composition.
The following details are more technical. The analysis starts with
splitting species abundance data into discrete abundance levels
called pseudospecies. With these the function constructs a stacked
binary matrix with values of 0 and 1, where value 1 means that
species occurs at given threshold (called ‘cut level’) in a
quadrat. Then the pseudospecies (abundance levels) that occur in
fewer than 1/5 of quadrats are downweighted so that their presences
(values 1) are reduced linearly towards minimimum value of 0.01
according to their frequencies. This will reduce their impact in
correspondence analysis which is regarded as being sensitive to
rare species. Each cut level of the species is downweighted
independently. The first axis of correspondence analysis is found
for the downweighted data. This initial step can be reproduced with
the help of functions twinsform
or
twin2stack
. However, the division is not based on
this step only. Next the method finds the best indicator
pseudospecies for the division. Further, it polarizes the
ordination by using indicator scores for all species to find the
final classes for quadrats. It does not mechanically just
split the axis in the middle, but it finds the cutpoint so that
indicator scores from the indicator pseudospecies and final split
are as concordant as possible. Then the analysis is repeated for
both resulting groups, including downweighting within the subset of
quadrats.
After quadrat classification, TWINSPAN constructs species data
which are completely different from the data used in quadrat
classification. Species values depend on their ability to
discriminate quadrat classes at any level of classification. Then
species are classified in the same way and with the same code as
the quadrats. In this way species classification is concordant with
quadrat classification, and good indicators of quadrat classes are
grouped together. The species classification can be reproduced with
function twin2specstack
which also provides a more
detailed description of the data structure used at this stage.
Function twinspan
performs only the classical TWINSPAN, but
with support functions the modified method of Roleček et al. (2009)
can be performed (see cuth
,
as.hclust.twinspan
).
Hill, M.O. (1979). TWINSPAN - a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of individuals and attributes. Cornell Univ., Dept of Ecology and Systematics.
Kent, M. & Coker, P. (1992) Vegetation description and analysis: A practical approach. John Wiley & Sons.
Roleček, J, Tichý, L., Zelený, D. & Chytrý, M. (2009). Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20: 596–602.
data(ahti)
## default cut levels
(tw <- twinspan(ahti))
## visual look at the divisions and group numbers
plot(tw)
## Braun-Blanquet scale
(twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75)))
plot(twb)
## compare confusion
table(cut(tw, level=3), cut(twb, level=3))
## modified method of Roleček et al. (2009)
plot(twb, height="chi", main = "Rolecek tree")
## compare against the default by hierarchy levels
table(cuth(twb, ngroups=8), cut(twb, level=3))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.