twinspan: Two-Way Indicator Species Analysis

twinspanR Documentation

Two-Way Indicator Species Analysis

Description

Two-Way Indicator Analysis (TWINSPAN) is a divisive classification method that works by splitting first Correspondence Analysis into two classes, and then recursively working with each split subset. The current function is based on and uses much of the original FORTRAN code of the original TWINSPAN (Hill 1979). twinspan is the main function of this package, but it works silently and prints very little information: you must use separate support functions to extract various aspects of the result.

Usage

twinspan(
  x,
  cutlevels = c(0, 2, 5, 10, 20),
  indmax = 7,
  groupmin = 5,
  levmax = 6,
  lind,
  lwgt,
  noind
)

Arguments

x

Input data, usually a species community data set where columns give the species and rows the sampling units.

cutlevels

Cut levels used to split quantitative data into binary pseudospecies. Max of 9 cutlevels can be used.

indmax

Maximum number of indicators for division (15 or less).

groupmin

Minimum group size for division (2 or larger).

levmax

Maximum depth of levels of divisions (15 or less).

lind

Weights for levels of pseudospecies. For example indicator potentials c(1, 0, 0,1, 0) signify that pseudospecies at levels 1 and 4 can be used as indicators, but that those at other levels cannot. In the default case, all species are available.

lwgt

Weights for the levels of pseudospecies. For example weights c(1, 2, 2, 2) signify that pseudospecies corresponding to 3 higher cut levels are to be given twice the weight of pseudospecies at the lowest level.

noind

Numbers (indices) of species that you wish to omit from list of potential indicators. Species omitted from this list are used in the calculation, but cannot appear as indicators.

Details

twinspan may not print anything when it runs, but it will return its result that you should save for later use. The functions that reproduce most of the traditional printout are summary.twinspan and twintable. The summary prints the history of divisions with eigenvalues, signed indicator pseudospecies and the threshold of indicator scores for division, and for terminal groups it prints the group size and group members (quadrats, species). Function twintable prints the classified community table. In addition, plot.twinspan shows the dendrogram corresponding to the summary with division numbers and sizes and id number of terminal groups. Function image.twinspan provides a graphical overview of major structure of classification as a prelude to twintable. With function as.dendrogram.twinspan it is possible to construct a dendrogram of complete classification down to final units (quadrats, species), and as.hclust.twinspan constructs an hclust tree down to final groups.

The twinspan function performs the classic TWINSPAN with fixed levels of hierarchy, but with other functions in this package it is also possible to perform the modified method of Roleček et al. (2009): the topography of the tree is the same, but the division heights are based on the hetergeneity of the group, and the groups are extracted in the order of their heterogeneity. The measure of the heterogeneity is total standardized chi-square (or inertia) of the divided group. This is based on the same matrix as used internally in twinspan code (see support functions twintotalchi, twin2stack and twin2specstack).

The classification at any level of division can be extracted with cut.twinspan, and the most heterogenous groups with cuth. Function predict.twinspan provides a similar classification vector, but based on indicator pseudospecies, and can be used also with new data that was not used in twinspan. These two classifications are often in conflict, and misclassified will detect those cases and the divisions where the two classifications diverged. Function eigenvals extracts the eigenvalues of divisions, and twintotalchi finds the “sum of all eigenvalues” or the standardized chi-square of each division or final group.

Function twinsform transforms the data similarly as twinspan and can be used to reproduce the results of any single division. Functions twin2mat and twin2stack extract the internal data matrices in standard R format from the twinspan result.

Value

Function returns an object of class "twinspan", with following items:

call

Function call.

cutlevels

Defined cutlevels. These will be used in predict.twinspan.

levelmax

Maximum level depth of divisions. The divisions will end when this depth is achieved.

nspecies

Number of species.

nquadrat

Number of quadrats.

idat

Pseudospecies data in the internal format used in twinspan. Functions twin2mat and twin2stack can change this into more usable format.

quadrat

Results for quadrats (described below).

species

Results for species (described below).

The results of the analysis are stored in items quadrat and species with similar structure, but species has only items iclass, eig, labels and index of the following:

iclass

ID numbers of final classes at the lowest level of hierarchy. These can be extracted with cut.twinspan which also can transform these to any higher level of hierarchy.

eig

Eigenvalues of divisions.

labels

Name labels of units (species, quadrats).

index

An index to order the units in twinspan displays, e.g., in twintable.

indicators

A matrix of dimensions maximum number of indicators \times maximum number of divisions giving the signed indices of indicator species for the division. These are shown in labels in summary.twinspan and used by predict.twinspan.

positivelimit

Lowest value of indicator score for positive group in a division.

indlabels

Labels of pseudospecies.

pseudo2species

Index from pseudospecies to the corresponding species.

Method

TWINSPAN is very complicated and has several obscure details, and it will not be explained in details in this manual, but you should consult the source code or literature sources. Hill (1979) is the most authorative source, but may be difficult to find. Kent & Coker (1991) do a great job in explaining the method, including many obscure details.

A strong simplification (but often sufficient to understand the basic principles) is that TWINSPAN is a divisive clustering based on splitting first correspondence analysis axis, and applying the same method recursively for resulting classes. The same method is used first for quadrats and then for species. In addition, it finds the species abundance levels (called ‘pseudospecies’) that best indicate these divisions facilitating ecologist's understanding of classes. Species classification is performed so that it best corresponds to the previous quadrat classification also with species composition.

The following details are more technical. The analysis starts with splitting species abundance data into discrete abundance levels called pseudospecies. With these the function constructs a stacked binary matrix with values of 0 and 1, where value 1 means that species occurs at given threshold (called ‘cut level’) in a quadrat. Then the pseudospecies (abundance levels) that occur in fewer than 1/5 of quadrats are downweighted so that their presences (values 1) are reduced linearly towards minimimum value of 0.01 according to their frequencies. This will reduce their impact in correspondence analysis which is regarded as being sensitive to rare species. Each cut level of the species is downweighted independently. The first axis of correspondence analysis is found for the downweighted data. This initial step can be reproduced with the help of functions twinsform or twin2stack. However, the division is not based on this step only. Next the method finds the best indicator pseudospecies for the division. Further, it polarizes the ordination by using indicator scores for all species to find the final classes for quadrats. It does not mechanically just split the axis in the middle, but it finds the cutpoint so that indicator scores from the indicator pseudospecies and final split are as concordant as possible. Then the analysis is repeated for both resulting groups, including downweighting within the subset of quadrats.

After quadrat classification, TWINSPAN constructs species data which are completely different from the data used in quadrat classification. Species values depend on their ability to discriminate quadrat classes at any level of classification. Then species are classified in the same way and with the same code as the quadrats. In this way species classification is concordant with quadrat classification, and good indicators of quadrat classes are grouped together. The species classification can be reproduced with function twin2specstack which also provides a more detailed description of the data structure used at this stage.

Function twinspan performs only the classical TWINSPAN, but with support functions the modified method of Roleček et al. (2009) can be performed (see cuth, as.hclust.twinspan).

References

Hill, M.O. (1979). TWINSPAN - a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of individuals and attributes. Cornell Univ., Dept of Ecology and Systematics.

Kent, M. & Coker, P. (1992) Vegetation description and analysis: A practical approach. John Wiley & Sons.

Roleček, J, Tichý, L., Zelený, D. & Chytrý, M. (2009). Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20: 596–602.

Examples


data(ahti)
## default cut levels
(tw <- twinspan(ahti))
## visual look at the divisions and group numbers
plot(tw)
## Braun-Blanquet scale
(twb <- twinspan(ahti, cutlevels = c(0, 0.1, 1, 5, 25, 50, 75)))
plot(twb)
## compare confusion
table(cut(tw, level=3), cut(twb, level=3))
## modified method of Roleček et al. (2009)
plot(twb, height="chi", main = "Rolecek tree")
## compare against the default by hierarchy levels
table(cuth(twb, ngroups=8), cut(twb, level=3))



jarioksa/twinspan documentation built on Nov. 23, 2024, 2:49 p.m.