A Quick Introduction to iNEXT via Examples

knitr::opts_chunk$set(collapse = TRUE, comment = "", 
                      fig.retina=2,
                      fig.align='center',
                      fig.width = 8, fig.height = 6,
                      out.width = "100%")
library(iNEXT)
library(ggplot2)
data("spider")
data("ant")
data("ciliates")

iNEXT (iNterpolation and EXTrapolation) is an R package modified from the original version which was supplied in the Supplement of Chao et al. (2014). In the latest updated version, we have added more user‐friendly features and refined the graphic displays. In this document, we provide a quick introduction demonstrating how to run iNEXT. Detailed information about iNEXT functions is provided in the iNEXT Manual, also available in CRAN. See Chao & Jost (2012), Colwell et al. (2012) and Chao et al. (2014) for methodologies. A short review of the theoretical background and a brief description of methods are included in an application paper by Hsieh, Ma & Chao (2016). An online version of iNEXT (http://chao.stat.nthu.edu.tw/wordpress/software_download/inext-online/) is also available for users without an R background.

iNEXT focuses on three measures of Hill numbers of order q: species richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy) and Simpson diversity (q = 2, the inverse of Simpson concentration). For each diversity measure, iNEXT uses the observed sample of abundance or incidence data (called the “reference sample”) to compute diversity estimates and the associated 95% confidence intervals for the following two types of rarefaction and extrapolation (R/E):

  1. Sample‐size‐based R/E sampling curves: iNEXT computes diversity estimates for rarefied and extrapolated samples up to an appropriate size. This type of sampling curve plots the diversity estimates with respect to sample size.
  2. Coverage‐based R/E sampling curves: iNEXT computes diversity estimates for rarefied and extrapolated samples with sample completeness (as measured by sample coverage) up to an appropriate coverage. This type of sampling curve plots the diversity estimates with respect to sample coverage.

iNEXT also plots the above two types of sampling curves and a sample completeness curve. The sample completeness curve provides a bridge between these two types of curves.

SOFTWARE NEEDED TO RUN INEXT IN R

HOW TO RUN INEXT:

The iNEXT package is available on CRAN and can be downloaded with a standard R installation procedure using the following commands. For a first‐time installation, an additional visualization extension package (ggplot2) must be loaded.

## install iNEXT package from CRAN
install.packages("iNEXT")

## install the latest version from github
install.packages('devtools')
library(devtools)
install_github('JohnsonHsieh/iNEXT')

## import packages
library(iNEXT)
library(ggplot2)

Remark: In order to install devtools package, you should update R to the latest version. Also, to get install_github to work, you should install the httr package.

MAIN FUNCTION: iNEXT()

We first describe the main function iNEXT() with default arguments:

iNEXT(x, q=0, datatype="abundance", size=NULL, endpoint=NULL, knots=40, se=TRUE, conf=0.95, nboot=50)

The arguments of this function are briefly described below, and will be explained in more details by illustrative examples in later text. This main function computes diversity estimates of order q, the sample coverage estimates and related statistics for K (if knots=K) evenly‐spaced knots (sample sizes) between size 1 and the endpoint, where the endpoint is described below. Each knot represents a particular sample size for which diversity estimates will be calculated. By default, endpoint = double the reference sample size (total sample size for abundance data; total sampling units for incidence data). For example, if endpoint = 10, knot = 4, diversity estimates will be computed for a sequence of samples with sizes (1, 4, 7, 10).

Argument Description
x a matrix, data.frame, lists of species abundances, or lists of incidence frequencies (see data format/information below).
q a number or vector specifying the diversity order(s) of Hill numbers.
datatype type of input data, "abundance", "incidence_raw" or "incidence_freq".
size an integer vector of sample sizes for which diversity estimates will be computed. If NULL, then diversity estimates will be calculated for those sample sizes determined by the specified/default endpoint and knots.
endpoint an integer specifying the sample size that is the endpoint for R/E calculation; If NULL, then endpoint=double the reference sample size.
knots an integer specifying the number of equally‐spaced knots between size 1 and the endpoint.
se a logical variable to calculate the bootstrap standard error and conf confidence interval.
conf a positive number < 1 specifying the level of confidence interval.
nboot an integer specifying the number of bootstrap replications.

This function returns an "iNEXT" object which can be further used to make plots using the function ggiNEXT() to be described below.

DATA FORMAT/INFORMATION

Three types of data are supported:

  1. Individual‐based abundance data (datatype="abundance"): Input data for each assemblage/site include samples species abundances in an empirical sample of n individuals (“reference sample”). When there are N assemblages, input data consist of an S by N abundance matrix, or N lists of species abundances.

  2. Sampling‐unit‐based incidence data: There are two kinds of input data.
    (1) Incidence‐raw data (datatype="incidence_raw"): for each assemblage, input data for a reference sample consist of a species‐by‐sampling‐unit matrix; when there are N assemblages, input data consist of N lists of matrices, and each matrix is a species‐by‐sampling‐unit matrix.
    (2) Incidence‐frequency data (datatype="incidence_freq"): input data for each assemblage consist of species sample incidence frequencies (row sums of each incidence matrix). When there are N assemblages, input data consist of an S by N matrix, or N lists of species incidence frequencies. The first entry of each list must be the total number of sampling units, followed by the species incidence frequencies.

RAREFACTION/EXTRAPOLATION VIA EXAMPLES

Two data sets (spider for abundance data and ant for incidence data) are included in iNEXT package. See Chao et al. (2014) for analysis details and data interpretations. The spider data consist of abundance data from two canopy manipulation treatments (“Girdled” and “Logged”) of hemlock trees (Ellison et al. 2010). For these data, the following commands display the sample species abundances and run the iNEXT() function for q = 0.

data(spider)
str(spider)
iNEXT(spider, q=0, datatype="abundance")

The iNEXT() function returns the "iNEXT" object including three data frames: $DataInfo for summarizing data information; $iNextEst for showing diversity estimates along with related statistics for a series of rarefied and extrapolated samples; and $AsyEst for showing asymptotic diversity estimates along with related statistics. $DataInfo, as shown below, returns basic data information including the reference sample size (n), observed species richness (S.obs), a sample coverage estimate (SC), and the first ten frequency counts (f1‐f10). This part of output can also be computed by the function DataInfo()

$DataInfo: basic data information
     site   n S.obs     SC f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
1 Girdled 168    26 0.9289 12  4  0  1  0  2  0  1  1   0
2  Logged 252    37 0.9446 14  4  4  3  1  0  3  2  0   1

For incidence data, the list $DataInfo includes the reference sample size (T), observed species richness (S.obs), total number of incidences (U), a sample coverage estimate (SC), and the first ten incidence frequency counts (Q1‐Q10).

In the Girdled treatment site, by default, 40 equally spaced knots (samples sizes) between 1 and 336 (= 2 x 168, double reference sample size, Chao et al. 2014) are selected. Diversity estimates and related statistics are computed for these 40 knots (corresponding to sample sizes m = 1, 10, 19, …, 336), which locates the reference sample at the mid‐point of the selected knots. If the argument se=TRUE, then the bootstrap method is applied to obtain the 95% confidence intervals for each diversity and sample coverage estimates. For the sample size corresponding to each knot, the list `$iNextEst` (as shown below for the Girdled treatment site) includes the sample size (`m`, i.e., each of the 40 knots), the method (`interpolated`, `observed`, or `extrapolated`, depending on whether the size `m` is less than, equal to, or greater than the reference sample size), the diversity order, the diversity estimate of order q (`qD`), the 95% lower and upper confidence limits of diversity (`qD.LCL`, `qD.UCL`), and the sample coverage estimate (`SC`) along with the 95% lower and upper confidence limits of sample coverage (`SC.LCL`, `SC.UCL`). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve and coverage-based R/E curves. wzxhzdk:5 `$AsyEst` lists the observed diversity, asymptotic estimates, estimated bootstrap s.e. and 95% confidence intervals for Hill numbers with q = 0, 1, and 2. The estimated asymptotes are calculated via the functions `ChaoSpecies()` for q = 0, `ChaoEntropy()` for q = 1 and `EstSimpson()` for q = 2; see Chao et al. (2014) for asymptotic estimators. The output for the spider data is shown below. All row and column variables are self‐explanatory. wzxhzdk:6 The user may specify an integer sample size for the argument endpoint to designate the maximum sample size of R/E calculation. For species richness, the extrapolation method is reliable up to the double reference sample size; beyond that, the prediction bias may be large. However, for measures of q = 1 and 2, the extrapolation can usually be safely extended to the asymptote if data are not sparse; thus there is no limit for the value of endpoint for these two measures. The user may also specify the number of knots in the range of sample size between 1 and the endpoint. If you choose a large number of knots, then it may take a long time to obtain the output due to the time‐consuming bootstrap method. Alternatively, the user may specify a series of sample sizes for R/E computation, as in the following example: wzxhzdk:7 Further, `iNEXT` can simultaneously run R/E computation for Hill numbers with q = 0, 1, and 2 by specifying a vector for the argument `q` as follows: wzxhzdk:8 Remark that a data.frame input format for abundance-based analysis also allows in iNEXT: wzxhzdk:9 ## GRAPHIC DISPLAYS: FUNCTION ggiNEXT() The function `ggiNEXT()`, which extends `ggplot2` to the `"iNEXT"` object with default arguments, is described as follows: wzxhzdk:10 Here `x` is an `"iNEXT"` object. Three types of curves are allowed: (1) Sample-size-based R/E curve (`type=1`): see Figs. 1a and 2a in the main text. This curve plots diversity estimates with confidence intervals (if `se=TRUE`) as a function of sample size up to double the reference sample size, by default, or a user‐specified `endpoint`. (2) Sample completeness curve (`type=2`) with confidence intervals (if `se=TRUE`): see Figs. 1b and 2b in the main text. This curve plots the sample coverage with respect to sample size for the same range described in (1). (3) Coverage-based R/E curve (`type=3`): see Figs. 1c and 2c in the main text. This curve plots the diversity estimates with confidence intervals (if `se=TRUE`) as a function of sample coverage up to the maximum coverage obtained from the maximum size described in (1). The argument `facet.var=("none", "order", "site" or "both")` is used to create a separate plot for each value of the specified variable. For example, the following code displays a separate plot (in Figs 1a and 1c) for each value of the diversity order q. The user may also use the argument `grey=TRUE` to plot black/white figures. The usage of color.var is illustrated in the incidence data example described in later text. The `ggiNEXT()` function is a wrapper around `ggplot2` package to create a R/E curve using a single line of code. The resulting object is of class `"ggplot"`, so can be manipulated using the `ggplot2` tools. wzxhzdk:11 The argument `facet.var="site"` in `ggiNEXT` function creates a separate plot for each site as shown below: wzxhzdk:12 wzxhzdk:13 The argument `facet.var="order"` and `color.var="site"` creates a separate plot for each diversity order site, and within each plot, different colors are used for two sites. wzxhzdk:14 The following commands return the sample completeness curve in which different colors are used for the two sites: wzxhzdk:15 The following commands return the coverage‐based R/E sampling curves in which different colors are used for the two sites (`facet.var="site"`) and for three orders (`facet.var="order"`) wzxhzdk:16 wzxhzdk:17 ## INCIDENCE DATA For illustration, we use the tropical ant data (in the dataset ant included in the package) at five elevations (50m, 500m, 1070m, 1500m, and 2000m) collected by Longino & Colwell (2011) from Costa Rica. The 5 lists of incidence frequencies are shown below. The first entry of each list must be the total number of sampling units, followed by the species incidence frequencies. wzxhzdk:18 The argument `color.var = ("none", "order", "site" or "both")` is used to display curves in different colors for values of the specified variable. For example, the following code using the argument `color.var="site"` displays the sampling curves in different colors for the five sites. Note that `theme_bw()` is a ggplot2 function to modify display setting from grey background to black‐and‐white. The following commands return three types R/E sampling curves for ant data. wzxhzdk:19 wzxhzdk:20 wzxhzdk:21 ## POINT ESTIMATION FUNCTION: estimateD() We also supply the function wzxhzdk:22 to compute diversity estimates with q = 0, 1, 2 for any particular level of sample size (`base="size"`) or any specified level of sample coverage (`base="coverage"`) for either abundance data (`datatype="abundance"`) or incidence data (`datatype="incidence_freq" or "incidence_raw"`). If `level=NULL`, this function computes the diversity estimates for the minimum sample size/coverage among all sites. For example, the following command returns the species diversity with a specified level of sample coverage of 98.5% for the ant data. For some sites, this coverage value corresponds to the rarefaction part whereas the others correspond to extrapolation, as indicated in the method of the output. wzxhzdk:23 ## RAW INCIDENCE DATA FUNCTION: incidence_raw Note that `datatype="incidence_raw"` is a new feature in iNEXT version 2.0.6. We here demonstrate its use via the ciliates data from three coastal dune habitats. The data set (`ciliates`) included in the package is a list of three matrices; each matrix is a species by plots data.frame. Run the following commands to get the output graphics as shown below. wzxhzdk:24 wzxhzdk:25 ## Hacking ggiNEXT() The `ggiNEXT()` function is a wrapper around `ggplot2` package to create a R/E curve using a single line of code. The resulting object is of class `"ggplot"`, so can be manipulated using the `ggplot2` tools. The following are some useful examples for customizing graphs. ### remove legend wzxhzdk:26 ### change to theme and legend.position wzxhzdk:27 ### display black‐white theme wzxhzdk:28 ### free the scale of axis wzxhzdk:29 ### change the shape of reference sample size wzxhzdk:30 ## General customization The data visualization package [`ggplot2`](https://cran.r-project.org/package=ggplot2) provides `scale_` function to customize data which is mapped into an aesthetic property of a `geom_`. The following functions would help user to customize `ggiNEXT` output. - change point shape: `scale_shape_manual` - change line type : `scale_linetype_manual` - change line color: `scale_colour_manual` - change band color: `scale_fill_manual` see [quick reference](http://sape.inf.usi.ch/quick-reference/ggplot2/scale) for style setting. ### Example: `spider` data To show how to custmized `ggiNEXT` output, we use abundance-based data `spider` as an example. wzxhzdk:31 ### Change shapes, line types and colors wzxhzdk:32 ## Customize point/line size by hacking In order to chage the size of reference sample point or rarefaction/extrapolation curve, user need modify `ggplot` object. - change point size: the reference sample size point is drawn on the first layer by `ggiNEXT`. Hacking point size by the following wzxhzdk:33 - change line width (size): the reference sample size point is drawn on the second layer by `ggiNEXT`. Hacking point size by the following wzxhzdk:34 wzxhzdk:35 ## Customize theme A `ggplot` object can be themed by adding a theme. User could run `help(theme_grey)` to show the default themes in `ggplot2`. Further, some extra themes provided by [`ggthemes`](https://cran.r-project.org/package=ggthemes) package. Examples shown in the following: wzxhzdk:36 wzxhzdk:37 ### Black-White theme The following are custmized themes for black-white figure. To modifiy legend, see [Cookbook for R](http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/) for more details. wzxhzdk:38 ### Draw R/E curves by yourself In [`iNEXT`]( https://cran.r-project.org/package=iNEXT), we provide a S3 `ggplot2::fortify` method for class `iNEXT`. The function `fortify` offers a single plotting interface for rarefaction/extrapolation curves. Set argument `type = 1, 2, 3` to plot the corresponding rarefaction/extrapolation curves. wzxhzdk:39 ## License The iNEXT package is licensed under the GPLv3. To help refine `iNEXT`, your comments or feedbacks would be welcome (please send them to Anne Chao or report an issue on iNEXT github [reop](https://github.com/JohnsonHsieh/iNEXT)). ## References - Chao, A., Gotelli, N.J., Hsieh, T.C., Sander, E.L., Ma, K.H., Colwell, R.K. & Ellison, A.M. (2014) Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs, 84, 45–67. - Chao, A. & Jost, L. (2012) Coverage‐based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology, 93, 2533–2547. - Colwell, R.K., Chao, A., Gotelli, N.J., Lin, S.‐Y., Mao, C.X., Chazdon, R.L. & Longino, J.T. (2012) Models and estimators linking individual‐based and sample‐based rarefaction, extrapolation and comparison of assemblages. Journal of Plant Ecology, 5, 3–21. - Ellison, A.M., Barker?Plotkin, A.A., Foster, D.R. & Orwig, D.A. (2010) Experimentally testing the role of foundation species in forests: the Harvard Forest Hemlock Removal Experiment. Methods in Ecology and Evolution, 1, 168–179. - Hsieh, T.C., Ma, K.H. & Chao, A. (2016) iNEXT: An R package for interpolation and extrapolation of species diversity (Hill numbers). Under revision, Methods in Ecology and Evolution. - Longino, J.T. & Colwell, R.K. (2011) Density compensation, species composition, and richness of ants on a neotropical elevational gradient. Ecosphere, 2:art29.

Try the iNEXT package in your browser

Any scripts or data that you put into this service are public.

iNEXT documentation built on May 29, 2017, 2:48 p.m.