addAlpha: Estimate alpha diversity indices

addAlphaR Documentation

Estimate alpha diversity indices

Description

The function estimates alpha diversity indices optionally using rarefaction, then stores results in colData.

Usage

addAlpha(
  x,
  assay.type = "counts",
  index = c("coverage_diversity", "fisher_diversity", "faith_diversity",
    "gini_simpson_diversity", "inverse_simpson_diversity",
    "log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance",
    "dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance",
    "relative_dominance", "simpson_lambda_dominance", "camargo_evenness",
    "pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness",
    "ace_richness", "chao1_richness", "hill_richness", "observed_richness"),
  name = index,
  niter = NULL,
  ...
)

## S4 method for signature 'SummarizedExperiment'
addAlpha(
  x,
  assay.type = "counts",
  index = c("coverage_diversity", "fisher_diversity", "faith_diversity",
    "gini_simpson_diversity", "inverse_simpson_diversity",
    "log_modulo_skewness_diversity", "shannon_diversity", "absolute_dominance",
    "dbp_dominance", "core_abundance_dominance", "gini_dominance", "dmn_dominance",
    "relative_dominance", "simpson_lambda_dominance", "camargo_evenness",
    "pielou_evenness", "simpson_evenness", "evar_evenness", "bulla_evenness",
    "ace_richness", "chao1_richness", "hill_richness", "observed_richness"),
  name = index,
  niter = NULL,
  ...
)

Arguments

x

a SummarizedExperiment object.

assay.type

the name of the assay used for calculation of the sample-wise estimates. (Default: "counts")

index

a character vector, specifying the alpha diversity indices to be calculated.

name

a name for the column(s) of the colData the results should be stored in. By default this will use the original names of the calculated indices. (Default: index)

niter

NULL or a single integer value for the number of rarefaction rounds. Rarefaction is not applied when niter=NULL (see Details section). (Default: NULL)

...

optional arguments:

  • sample: A numeric value specifying the rarefaction depth i.e. the number of counts drawn from each sample. (Default: min(colSums2(assay(x, assay.type))))

  • tree.name: (Faith's index) A single character value for specifying which rowTree will be used. (Default: "phylo")

  • node.label: (Faith's index) NULL or a character vector specifying the links between rows and node labels of phylogeny tree specified by tree.name. If a certain row is not linked with the tree, missing instance should be noted as NA. When NULL, all the rownames should be found from the tree. (Default: NULL)

  • only.tips: (Faith's index) A boolean value specifying whether to remove internal nodes when Faith's index is calculated. When only.tips=TRUE, those rows that are not tips of tree are removed. (Default: FALSE)

  • threshold: (coverage and all evenness indices) A numeric value in the unit interval determining the threshold for coverage and evenness indices. When evenness indices are calculated values under or equal to this threshold are denoted as zeroes. For coverage index, see details. (Default: 0.5 for coverage, 0 for evenness indices)

  • quantile: (log modulo skewness index) Arithmetic abundance classes are evenly cut up to to this quantile of the data. The assumption is that abundances higher than this are not common, and they are classified in their own group. (Default: 0.5)

  • nclasses: (log modulo skewness index) The number of arithmetic abundance classes from zero to the quantile cutoff indicated by quantile. (Default: 50)

  • ntaxa: (absolute and relative indices) The n-th position of the dominant taxa to consider. (Default: 1)

  • aggregate: (absolute, dbp, dmn, and relative indices) Aggregate the values for top members selected by ntaxa or not. If TRUE, then the sum of relative abundances is returned. Otherwise the relative abundance is returned for the single taxa with the indicated rank (default: aggregate = TRUE).

  • detection: (observed index) A numeric value for selecting detection threshold for the abundances. (Default: 0)

Details

Diversity

Alpha diversity is a joint quantity that combines elements or community richness and evenness. Diversity increases, in general, when species richness or evenness increase.

The following diversity indices are available:

  • 'coverage': Number of species needed to cover a given fraction of the ecosystem (50 percent by default). Tune this with the threshold argument.

  • 'faith': Faith's phylogenetic alpha diversity index measures how long the taxonomic distance is between taxa that are present in the sample. Larger values represent higher diversity. Using this index requires rowTree. (Faith 1992)

    If the data includes features that are not in tree's tips but in internal nodes, there are two options. First, you can keep those features, and prune the tree to match features so that each tip can be found from the features. Other option is to remove all features that are not tips. (See only.tips parameter)

  • 'fisher': Fisher's alpha; as implemented in vegan::fisher.alpha. (Fisher et al. 1943)

  • 'gini_simpson': Gini-Simpson diversity i.e. 1 - lambda, where lambda is the Simpson index, calculated as the sum of squared relative abundances. This corresponds to the diversity index 'simpson' in vegan::diversity. This is also called Gibbs–Martin, or Blau index in sociology, psychology and management studies. The Gini-Simpson index (1-lambda) should not be confused with Simpson's dominance (lambda), Gini index, or inverse Simpson index (1/lambda).

  • 'inverse_simpson': Inverse Simpson diversity: 1/lambda where lambda=sum(p^2) and p refers to relative abundances. This corresponds to the diversity index 'invsimpson' in vegan::diversity. Don't confuse this with the closely related Gini-Simpson index

  • 'log_modulo_skewness': The rarity index characterizes the concentration of species at low abundance. Here, we use the skewness of the frequency distribution of arithmetic abundance classes (see Magurran & McGill 2011). These are typically right-skewed; to avoid taking log of occasional negative skews, we follow Locey & Lennon (2016) and use the log-modulo transformation that adds a value of one to each measure of skewness to allow logarithmization.

  • 'shannon': Shannon diversity (entropy).

Dominance

A dominance index quantifies the dominance of one or few species in a community. Greater values indicate higher dominance.

Dominance indices are in general negatively correlated with alpha diversity indices (species richness, evenness, diversity, rarity). More dominant communities are less diverse.

The following community dominance indices are available:

  • 'absolute': Absolute index equals to the absolute abundance of the most dominant n species of the sample (specify the number with the argument ntaxa). Index gives positive integer values.

  • 'dbp': Berger-Parker index (See Berger & Parker 1970) calculation is a special case of the 'relative' index. dbp is the relative abundance of the most abundant species of the sample. Index gives values in interval 0 to 1, where bigger value represent greater dominance.

    dbp = \frac{N_1}{N_{tot}}

    where N_1 is the absolute abundance of the most dominant species and N_{tot} is the sum of absolute abundances of all species.

  • 'core_abundance': Core abundance index is related to core species. Core species are species that are most abundant in all samples, i.e., in whole data set. Core species are defined as those species that have prevalence over 50\ species must be prevalent in 50\ calculate the core abundance index. Core abundance index is sum of relative abundances of core species in the sample. Index gives values in interval 0 to 1, where bigger value represent greater dominance.

    core_abundance = \frac{N_{core}}{N_{tot}}

    where N_{core} is the sum of absolute abundance of the core species and N_{tot} is the sum of absolute abundances of all species.

  • 'gini': Gini index is probably best-known from socio-economic contexts (Gini 1921). In economics, it is used to measure, for example, how unevenly income is distributed among population. Here, Gini index is used similarly, but income is replaced with abundance.

    If there is small group of species that represent large portion of total abundance of microbes, the inequality is large and Gini index closer to 1. If all species has equally large abundances, the equality is perfect and Gini index equals 0. This index should not be confused with Gini-Simpson index, which quantifies diversity.

  • 'dmn': McNaughton’s index is the sum of relative abundances of the two most abundant species of the sample (McNaughton & Wolf, 1970). Index gives values in the unit interval:

    dmn = (N_1 + N_2)/N_tot

    where N_1 and N_2 are the absolute abundances of the two most dominant species and N_{tot} is the sum of absolute abundances of all species.

  • 'relative': Relative index equals to the relative abundance of the most dominant n species of the sample (specify the number with the argument ntaxa). This index gives values in interval 0 to 1.

    relative = N_1/N_tot

    where N_1 is the absolute abundance of the most dominant species and N_{tot} is the sum of absolute abundances of all species.

  • 'simpson_lambda': Simpson's (dominance) index or Simpson's lambda is the sum of squared relative abundances. This index gives values in the unit interval. This value equals the probability that two randomly chosen individuals belongs to the same species. The higher the probability, the greater the dominance (See e.g. Simpson 1949).

    lambda = \sum(p^2)

    where p refers to relative abundances.

    There is also a more advanced Simpson dominance index (Simpson 1949). However, this is not provided and the simpler squared sum of relative abundances is used instead as the alternative index is not in the unit interval and it is highly correlated with the simpler variant implemented here.

Evenness

Evenness is a standard index in community ecology, and it quantifies how evenly the abundances of different species are distributed. The following evenness indices are provided:

By default, this function returns all indices.

The available evenness indices include the following (all in lowercase):

  • 'camargo': Camargo's evenness (Camargo 1992)

  • 'simpson_evenness': Simpson’s evenness is calculated as inverse Simpson diversity (1/lambda) divided by observed species richness S: (1/lambda)/S.

  • 'pielou': Pielou's evenness (Pielou, 1966), also known as Shannon or Shannon-Weaver/Wiener/Weiner evenness; H/ln(S). The Shannon-Weaver is the preferred term; see Spellerberg and Fedor (2003).

  • 'evar': Smith and Wilson’s Evar index (Smith & Wilson 1996).

  • 'bulla': Bulla’s index (O) (Bulla 1994).

Desirable statistical evenness metrics avoid strong bias towards very large or very small abundances; are independent of richness; and range within the unit interval with increasing evenness (Smith & Wilson 1996). Evenness metrics that fulfill these criteria include at least camargo, simpson, smith-wilson, and bulla. Also see Magurran & McGill (2011) and Beisel et al. (2003) for further details.

Richness

The richness is calculated per sample. This is a standard index in community ecology, and it provides an estimate of the number of unique species in the community. This is often not directly observed for the whole community but only for a limited sample from the community. This has led to alternative richness indices that provide different ways to estimate the species richness.

Richness index differs from the concept of species diversity or evenness in that it ignores species abundance, and focuses on the binary presence/absence values that indicate simply whether the species was detected.

The function takes all index names in full lowercase. The user can provide the desired spelling through the argument name (see examples).

The following richness indices are provided.

  • 'ace': Abundance-based coverage estimator (ACE) is another nonparametric richness index that uses sample coverage, defined based on the sum of the probabilities of the observed species. This method divides the species into abundant (more than 10 reads or observations) and rare groups in a sample and tends to underestimate the real number of species. The ACE index ignores the abundance information for the abundant species, based on the assumption that the abundant species are observed regardless of their exact abundance. We use here the bias-corrected version (O'Hara 2005, Chiu et al. 2014) implemented in estimateR. For an exact formulation, see estimateR. Note that this index comes with an additional column with standard error information.

  • 'chao1': This is a nonparametric estimator of species richness. It assumes that rare species carry information about the (unknown) number of unobserved species. We use here the bias-corrected version (O'Hara 2005, Chiu et al. 2014) implemented in estimateR. This index implicitly assumes that every taxa has equal probability of being observed. Note that it gives a lower bound to species richness. The bias-corrected for an exact formulation, see estimateR. This estimator uses only the singleton and doubleton counts, and hence it gives more weight to the low abundance species. Note that this index comes with an additional column with standard error information.

  • 'hill': Effective species richness aka Hill index (see e.g. Chao et al. 2016). Currently only the case 1D is implemented. This corresponds to the exponent of Shannon diversity. Intuitively, the effective richness indicates the number of species whose even distribution would lead to the same diversity than the observed community, where the species abundances are unevenly distributed.

  • 'observed': The observed richness gives the number of species that is detected above a given detection threshold in the observed sample (default 0). This is conceptually the simplest richness index. The corresponding index in the vegan package is "richness".

Value

x with additional colData column(s) named code

References

Beisel J-N. et al. (2003) A Comparative Analysis of Diversity Index Sensitivity. Internal Rev. Hydrobiol. 88(1):3-15. https://portais.ufg.br/up/202/o/2003-comparative_evennes_index.pdf

Berger WH & Parker FL (1970) Diversity of Planktonic Foraminifera in Deep-Sea Sediments. Science 168(3937):1345-1347. doi: 10.1126/science.168.3937.1345

Bulla L. (1994) An index of diversity and its associated diversity measure. Oikos 70:167–171

Camargo, JA. (1992) New diversity index for assessing structural alterations in aquatic communities. Bull. Environ. Contam. Toxicol. 48:428–434.

Chao A. (1984) Non-parametric estimation of the number of classes in a population. Scand J Stat. 11:265–270.

Chao A, Chun-Huo C, Jost L (2016). Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers. Biodiversity Conservation and Phylogenetic Systematics, Springer International Publishing, pp. 141–172, doi:10.1007/978-3-319-22461-9_8.

Chiu, C.H., Wang, Y.T., Walther, B.A. & Chao, A. (2014). Improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 70, 671-682.

Faith D.P. (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation 61(1):1-10.

Fisher R.A., Corbet, A.S. & Williams, C.B. (1943) The relation between the number of species and the number of individuals in a random sample of animal population. Journal of Animal Ecology 12, 42-58.

Gini C (1921) Measurement of Inequality of Incomes. The Economic Journal 31(121): 124-126. doi: 10.2307/2223319

Locey KJ and Lennon JT. (2016) Scaling laws predict global microbial diversity. PNAS 113(21):5970-5975; doi:10.1073/pnas.1521291113.

Magurran AE, McGill BJ, eds (2011) Biological Diversity: Frontiers in Measurement and Assessment (Oxford Univ Press, Oxford), Vol 12.

McNaughton, SJ and Wolf LL. (1970). Dominance and the niche in ecological systems. Science 167:13, 1–139

O'Hara, R.B. (2005). Species richness estimators: how many species can dance on the head of a pin? J. Anim. Ecol. 74, 375-386.

Pielou, EC. (1966) The measurement of diversity in different types of biological collections. J Theoretical Biology 13:131–144.

Simpson EH (1949) Measurement of Diversity. Nature 163(688). doi: 10.1038/163688a0

Smith B and Wilson JB. (1996) A Consumer's Guide to Evenness Indices. Oikos 76(1):70-82.

Spellerberg and Fedor (2003). A tribute to Claude Shannon (1916 –2001) and a plea for more rigorous use of species richness, species diversity and the ‘Shannon–Wiener’ Index. Alpha Ecology & Biogeography 12, 177–197.

See Also

  • plotColData

  • estimateR

  • diversity

Examples


data("GlobalPatterns")
tse <- GlobalPatterns

# Calculate the default Shannon index with no rarefaction
tse <- addAlpha(tse, index = "shannon")

# Shows the estimated Shannon index
tse$shannon

# Calculate observed richness with 10 rarefaction rounds
tse <- addAlpha(tse,
   assay.type = "counts",
   index = "observed_richness",
   sample = min(colSums(assay(tse, "counts")), na.rm = TRUE),
   niter=10)

# Shows the estimated observed richness
tse$observed_richness


FelixErnst/mia documentation built on July 15, 2024, 9:21 p.m.