knitr::opts_chunk$set(collapse = TRUE, comment = "", fig.retina=2, fig.align='center', fig.width = 6, fig.height = 4, warning = FALSE, message = FALSE) options("width"=200) if("gridExtra" %in% rownames(installed.packages()) == FALSE) {install.packages("gridExtra")} library(iNEXT.3D) library(ggplot2)
iNEXT.3D
(INterpolation and EXTrapolation for three dimensions of biodiversity) is a sequel to iNEXT (Hsieh et al., 2016).
Here the three dimensions (3D) of diversity include taxonomic diversity (TD), phylogenetic diversity (PD) and functional diversity (FD). An online version "iNEXT.3D Online"
(https://chao.shinyapps.io/iNEXT_3D/) is also available for users without an R background.
A unified framework based on Hill numbers (for TD) and their generalizations (Hill-Chao numbers, for PD and FD) is adopted to quantify 3D. In this framework, TD quantifies the effective number of species, PD quantifies the effective total branch length, mean-PD (PD divided by tree depth) quantifies the effective number of lineages, and FD quantifies the effective number of virtual functional groups (or functional "species"). Thus, TD, mean-PD, and FD are all in the same units of species/lineage equivalents and can be meaningfully compared; see Chao et al. (2014) for the basic standardization theory for TD, and Chao et al. (2021) for a review of the unified theory for 3D.
For each of the three dimensions of biodiversity, iNEXT.3D
features two statistical analyses (non-asymptotic and asymptotic):
(1) A non-asymptotic approach based on interpolation and extrapolation for 3D diversity (i.e., Hill-Chao numbers)
iNEXT.3D
computes the estimated 3D diversity for standardized samples with a common sample size or sample completeness. This approach aims to compare diversity estimates for equally-large (with a common sample size) or equally-complete (with a common sample coverage) samples; it is based on the seamless rarefaction and extrapolation (R/E) sampling curves of Hill-Chao numbers for q = 0, 1 and 2. For each dimension of biodiversity, iNEXT.3D
offers three types of R/E sampling curves:
Sample-size-based (or size-based) R/E sampling curves: This type of sampling curve plots the diversity estimates with respect to sample size.
Coverage-based R/E sampling curves: This type of sampling curve plots the diversity estimates with respect to sample coverage.
Sample completeness curve: This curve depicts how sample coverage varies with sample size. The sample completeness curve provides a bridge between the size- and coverage-based R/E sampling curves.
(2) An asymptotic approach to infer asymptotic 3D diversity (i.e., Hill-Chao numbers)
iNEXT.3D
computes the estimated asymptotic 3D diversity and also plots 3D diversity profiles (q-profiles) for q between 0 and 2, in comparison with the observed diversity. Typically, the asymptotic estimates for q $\geq$ 1 are reliable, but for q < 1 (especially for q = 0, species richness), the asymptotic estimates represent only lower bounds. iNEXT.3D
also features a time-profile (which depicts the observed and asymptotic estimate of PD or mean PD with respect to reference times), and a tau-profile (which depicts the observed and asymptotic estimate of FD with respect to threshold level tau).
If you publish your work based on results from iNEXT.3D
package, you should make references to the following methodology paper and the package:
Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K-H., Dornelas, M and. Magurran, A. E. (2021). Measuring temporal change in alpha diversity: a framework integrating taxonomic, phylogenetic and functional diversity and the iNEXT.3D standardization. Methods in Ecology and Evolution, 12, 1926-1940.
Chao, A. and Hu, K.-H. (2023). The iNEXT.3D package: interpolation and extrapolation for three dimensions of biodiversity. R package available from CRAN.
The iNEXT.3D
package can be downloaded from CRAN or Anne Chao's iNEXT.3D_github using the commands below. For a first-time installation, some additional packages must be installed and loaded; see package manual.
## install iNEXT.3D package from CRAN install.packages("iNEXT.3D") ## or install the latest version from github install.packages('devtools') library(devtools) install_github('AnneChao/iNEXT.3D') ## import packages library(iNEXT.3D)
There are six main functions in this package:
Two functions for non-asymptotic analysis with graphical displays:
iNEXT3D computes standardized 3D diversity estimates of order q = 0, 1 and 2 for rarefied and extrapolated samples at specified sample coverage values and sample sizes.
ggiNEXT3D visualizes the output from the function iNEXT3D
.
Two functions for point estimation and basic data information
estimate3D computes 3D diversity of order q = 0, 1 and 2 with a particular set of user-specified level of sample sizes or sample coverage values.
DataInfo3D provides basic data information based on the observed data.
Two functions for asymptotic analysis with graphical displays:
ObsAsy3D computes observed and asymptotic diversity of order q between 0 and 2 (in increments of 0.2) for 3D diversity; it also computes observed and asymptotic PD for specified reference times, and observed and asymptotic FD for specified threshold levels.
ggObsAsy3D visualizes the output from the function ObsAsy3D
.
Although species identities/names are not required to assess TD or compare TD across individual assemblages (as in the iNEXT
package), they are required for PD and FD. Thus, for iNEXT.3D
package, information on species identity (or any unique identification code) and assemblage affiliation is required. Two types of species abundance/incidence data are supported:
Individual-based abundance data (datatype = "abundance"
): When there are multiple assemblages, in addition to the assemblage/site names (as column names) and the species names (as row names), species abundance data (reference sample) can be input as a species (in rows) by assemblage (in columns) matrix/data.frame or a list of species abundance vectors. In the special case that there is only one assemblage, all data should be read in one column.
Sampling-unit-based incidence data: Incidence-raw data (datatype = "incidence_raw"
): for each assemblage, input data for a reference sample consist of a species-by-sampling-unit matrix, in addition to the sampling-unit names (as column names) and the species names (as row names). When there are N assemblages, input data consist of N lists of matrices, and each matrix is a species-by-sampling-unit matrix. Each element in the incidence raw matrix is 1 for a detection, and 0 for a non-detection. Input a matrix which combines data for all assemblages is allowed, but the argument nT
in the function iNEXT3D
must be specified so that the number of sampling units in each assemblage is specified.
For example, the dataset Brazil_rainforest_abun_data
included in the iNEXT.3D
package consists of species sample abundances of two assemblages/habitats: "Edge" and "Interior". Run the following code to view the first 15 rows of the abundance data.
data("Brazil_rainforest_abun_data") Brazil_rainforest_abun_data
data("Brazil_rainforest_abun_data") Brazil_rainforest_abun_data[1:15,]
We use data (Fish_incidence_data
) collected from two time periods, namely "2013-2015"
and "2016-2018"
, as an example. Each time period is designated as an assemblage. The purpose was to compare 3D diversity of the two time periods. In each time period, species incidence/occurrence was recorded in 36 sampling units in each assemblage; each sampling unit represents a sampling date. Thus, there are 36 columns in each time period. Run the following code to view the first 6 rows and 6 columns for each matrix.
data("Fish_incidence_data") Fish_incidence_data
data("Fish_incidence_data") lapply(Fish_incidence_data, function(x) x[1:6,1:6])
To perform PD analysis, the phylogenetic tree (in Newick format) spanned by species observed in the pooled data is required. For the dataset Fish_incidence_data
, the phylogenetic tree for all observed species (including species in both time periods) is stored in the file fish_phylo_tree
; for the dataset Brazil_rainforest_abun_data
, the phylogenetic tree for all observed species (including species in both Edge and Interior habitats) is stored in the file Brazil_rainforest_phylo_tree
. A partial list of the tip labels and node labels are shown below.
data("Brazil_rainforest_phylo_tree") Brazil_rainforest_phylo_tree
To perform FD analysis, the species-pairwise distance matrix (Gower distance computed from species traits) for species observed in the pooled data is required in a matrix/data.frame format. For the dataset Fish_incidence_data
, the distance matrix for all observed species (including species in both time periods) is stored in the file fish_dist_matrix
; for the dataset Brazil_rainforest_abun_data
, the distance matrix for all species (including species in both Edge and Interior habitats) is stored in the file Brazil_rainforest_dist_matrix
. The distance matrix for the first 3 Brazil rainforest tree species is shown below.
data("Brazil_rainforest_distance_matrix") Brazil_rainforest_distance_matrix
data("Brazil_rainforest_distance_matrix") round(Brazil_rainforest_distance_matrix[1:3,1:3], 3)
We first describe the main function iNEXT3D()
with default arguments:
iNEXT3D(data, diversity = 'TD', q = c(0,1,2), datatype = "abundance", size = NULL, endpoint = NULL, knots = 40, nboot = 50, conf = 0.95, nT = NULL, PDtree = NULL, PDreftime = NULL, PDtype = 'meanPD', FDdistM, FDtype = 'AUC', FDtau = NULL, FDcut_number = 50)
The arguments of this function are briefly described below, and will be explained in more details by illustrative examples in later text. This main function computes standardized 3D diversity estimates of order q = 0, 1 and 2, the sample coverage estimates, and related statistics for K (if knots = K
in the specified argument) evenly-spaced knots (sample sizes) between size 1 and the endpoint
, where the endpoint is described below. Each knot represents a particular sample size for which 3D diversity estimates will be calculated. By default, endpoint
= double the reference sample size for abundance data or double the total sampling units for incidence data. For example, if endpoint = 10
, knot = 4
is specified, diversity estimates will be computed for a sequence of samples with sizes (1, 4, 7, 10).
Argument | Description |
---|---|
data |
(a) For datatype = "abundance" , data can be input as a vector of species abundances (for a single assemblage), matrix/data.frame (species by assemblages), or a list of species abundance vectors. \cr
(b) For datatype = "incidence_raw" , data can be input as a list of matrices/data.frames (species by sampling units); data can also be input as a single matrix/data.frame by merging all sampling units across assemblages based on species identity; in this case, the number of sampling units (nT, see below) must be specified. |
diversity |
selection of diversity type: 'TD' = Taxonomic diversity, 'PD' = Phylogenetic diversity, and 'FD' = Functional diversity. |
q |
a numerical vector specifying the diversity orders. Default is c(0, 1, 2) . |
datatype |
data type of input data: individual-based abundance data (datatype = "abundance" ), or species by sampling-units incidence/occurrence matrix (datatype = "incidence_raw" ) with all entries being 0 (non-detection) or 1 (detection). |
size |
an integer vector of sample sizes (number of individuals or sampling units) for which diversity estimates will be computed.
If NULL , then diversity estimates will be computed for those sample sizes determined by the specified/default endpoint and knots . |
endpoint |
an integer specifying the sample size that is the endpoint for rarefaction/extrapolation.
If NULL, then endpoint = double the reference sample size. |
knots |
an integer specifying the number of equally-spaced knots (say K, default is 40) between size 1 and the endpoint ; each knot represents a particular sample size for which diversity estimate will be calculated.
If the endpoint is smaller than the reference sample size, then iNEXT3D() computes only the rarefaction estimates for approximately K evenly spaced knots .
If the endpoint is larger than the reference sample size, then iNEXT3D() computes rarefaction estimates for approximately K/2 evenly spaced knots between sample size 1 and the reference sample size, and computes extrapolation estimates for approximately K/2 evenly spaced knots between the reference sample size and the endpoint . |
nboot |
a positive integer specifying the number of bootstrap replications when assessing sampling uncertainty and constructing confidence intervals. Enter 0 to skip the bootstrap procedures. Default is 50. |
conf |
a positive number < 1 specifying the level of confidence interval. Default is 0.95. |
nT |
(required only when datatype = "incidence_raw" and input data in a single matrix/data.frame) a vector of nonnegative integers specifying the number of sampling units in each assemblage. If assemblage names are not specified(i.e., names(nT) = NULL ), then assemblages are automatically named as "assemblage1", "assemblage2",..., etc. |
PDtree |
(required argument for diversity = "PD" ), a phylogenetic tree in Newick format for all observed species in the pooled assemblage. |
PDreftime |
(argument only for diversity = "PD" ), a vector of numerical values specifying reference times for PD. Default is NULL (i.e., the age of the root of PDtree). |
PDtype |
(argument only for diversity = "PD" ), select PD type: PDtype = "PD" (effective total branch length) or PDtype = "meanPD" (effective number of equally divergent lineages). Default is "meanPD" , where meanPD = PD/tree depth . |
FDdistM |
(required argument for diversity = "FD" ), a species pairwise distance matrix for all species in the pooled assemblage. |
FDtype |
(argument only for diversity = "FD" ), select FD type: FDtype = "tau_values" for FD under specified threshold values, or FDtype = "AUC" (area under the curve of tau-profile) for an overall FD which integrates all threshold values between zero and one. Default is "AUC" . |
FDtau |
(argument only for diversity = "FD" and FDtype = "tau_values" ), a numerical vector between 0 and 1 specifying tau values (threshold levels). If NULL (default), then threshold is set to be the mean distance between any two individuals randomly selected from the pooled assemblage (i.e., quadratic entropy). |
FDcut_number |
(argument only for diversity = "FD" and FDtype = "AUC" ), a numeric number to cut [0, 1] interval into equal-spaced sub-intervals to obtain the AUC value by integrating the tau-profile. Equivalently, the number of tau values that will be considered to compute the integrated AUC value. Default is FDcut_number = 50 . A larger value can be set to obtain more accurate AUC value. |
For each dimension of diversity (TD
, PD
, FD
), the main function iNEXT3D()
returns the iNEXT3D
object, which can be further used to make plots using the function ggiNEXT3D()
to be described below. The "iNEXT3D"
object includes three lists:
(1) $TDInfo
($PDInfo
,or $FDInfo
) for summarizing data information.
(2) $TDiNextEst
($PDiNextEst
, or $FDiNextEst
) for showing diversity estimates along with related statistics for a series of rarefied and extrapolated samples; there are two data frames ($size_based
and $coverage_based
) conditioning on standardized sample size or sample coverage, respectively.
(3) $TDAsyEst
($PDAsyEst
, or $FDAsyEst
) for showing asymptotic diversity estimates along with related statistics.
The function ggiNEXT3D()
, which extends ggplot2
with default arguments, is described as follows:
ggiNEXT3D(output, type = 1:3, facet.var = "Assemblage", color.var = "Order.q")
Here output
is the iNEXT3D()
object. Three types of curves are allowed for 3D diversity:
(1) Sample-size-based R/E curve (type = 1
): This curve plots diversity estimates with confidence intervals as a function of sample size.
(2) Sample completeness curve (type = 2
): This curve plots the sample coverage with respect to sample size.
(3) Coverage-based R/E curve (type = 3
): This curve plots the diversity estimates with confidence intervals as a function of sample coverage.
The argument facet.var = "Order.q"
, facet.var = "Assemblage"
, facet.var = "Both"
, or facet.var = "None"
is used to create a separate plot for each value of the specified variable.
The ggiNEXT3D()
function is a wrapper with the package ggplot2
to create a rarefaction/extrapolation sampling curve in a single line of code. The figure object is of class "ggplot"
, so it can be manipulated by using the ggplot2
tools.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree distM <- Brazil_rainforest_distance_matrix output_TD_abun <- iNEXT3D(data, diversity = "TD", q = c(0, 1, 2), datatype = "abundance") output_PD_abun <- iNEXT3D(data, diversity = 'PD', q = c(0, 1, 2), datatype = "abundance", nboot = 20, PDtree = tree) output_FD_abun <- iNEXT3D(data, diversity = 'FD', datatype = "abundance", nboot = 10, FDdistM = distM, FDtype = 'AUC') data(Fish_incidence_data) data(Fish_phylo_tree) data(Fish_distance_matrix) data <- Fish_incidence_data tree <- Fish_phylo_tree distM <- Fish_distance_matrix output_TD_inci <- iNEXT3D(data, diversity = 'TD', q = c(0, 1, 2), datatype = "incidence_raw") output_PD_inci <- iNEXT3D(data, diversity = 'PD', q = c(0, 1, 2), datatype = "incidence_raw", nboot = 20, PDtree = tree) output_FD_inci <- iNEXT3D(data, diversity = 'FD', datatype = "incidence_raw", nboot = 20, FDdistM = distM, FDtype = 'AUC')
Based on the dataset (Brazil_rainforest_abun_data
) included in the package, the following commands return all numerical results for TD
. The first list of the output ($TDInfo
) returns basic data information including the name of the Assemblage, sample size (n
), observed species richness (S.obs
), sample coverage estimate of the reference sample with size n (SC(n)
), sample coverage estimate of the extrapolated sample with size 2n (SC(2n)
) as well as the first five species abundance frequency counts in the reference sample (f1-f5
). The output is identical to that based on the function DataInfo3D()
by specifying diversity = 'TD'
and datatype = "abundance"
; see later text). Thus, if only data information is required, the simpler function DataInfo3D()
(see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D()
, which will be introduced later.
data(Brazil_rainforest_abun_data) output_TD_abun <- iNEXT3D(Brazil_rainforest_abun_data, diversity = 'TD', q = c(0,1,2), datatype = "abundance") output_TD_abun$TDInfo
output_TD_abun[1]$TDInfo$`SC(n)` <- round(output_TD_abun[1]$TDInfo$`SC(n)`, 3) output_TD_abun[1]$TDInfo$`SC(2n)` <- round(output_TD_abun[1]$TDInfo$`SC(2n)`, 3) output_TD_abun[1]
The second list of the output ($TDiNextEst
) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "Edge" assemblage, corresponding to the target sample size m
= 1, 95, 189, ..., 1699, 1794, 1795, 1899, ..., 3588), which locates the reference sample size at the mid-point of the selected knots. There are two data frames ($size_based
and $coverage_based
).
The first data frame ($size_based
) includes the name of the Assemblage, diversity order (Order.q
), the target sample size (m
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the size m
is less than, equal to, or greater than the reference sample size), the diversity estimate of order q (qTD
), the lower and upper confidence limits of diversity (qTD.LCL
and qTD.UCL
) conditioning on the sample size, and the corresponding sample coverage estimate (SC
) along with the lower and upper confidence limits of sample coverage (SC.LCL
and SC.UCL
). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot
is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based
output are displayed:
output_TD_abun$TDiNextEst$size_based
output_TD_abun$TDiNextEst$size_based$qTD <- round(output_TD_abun$TDiNextEst$size_based$qTD, 3) output_TD_abun$TDiNextEst$size_based$qTD.LCL <- round(output_TD_abun$TDiNextEst$size_based$qTD.LCL, 3) output_TD_abun$TDiNextEst$size_based$qTD.UCL <- round(output_TD_abun$TDiNextEst$size_based$qTD.UCL, 3) output_TD_abun$TDiNextEst$size_based$SC <- round(output_TD_abun$TDiNextEst$size_based$SC, 3) output_TD_abun$TDiNextEst$size_based$SC.LCL <- round(output_TD_abun$TDiNextEst$size_based$SC.LCL, 3) output_TD_abun$TDiNextEst$size_based$SC.UCL <- round(output_TD_abun$TDiNextEst$size_based$SC.UCL, 3) output_TD_abun$TDiNextEst$size_based[1:6,]
The second data frame ($coverage_based
) includes the name of assemblage, the diversity order (Order.q
), the target sample coverage value (SC
), the corresponding sample size (m
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the coverage SC
is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qTD
), the lower and upper confidence limits of diversity (qTD.LCL
and qTD.UCL
) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based
output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based
table is wider than the corresponding interval in the $size_based
table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)
output_TD_abun$TDiNextEst$coverage_based
output_TD_abun$TDiNextEst$coverage_based$qTD <- round(output_TD_abun$TDiNextEst$coverage_based$qTD, 3) output_TD_abun$TDiNextEst$coverage_based$qTD.LCL <- round(output_TD_abun$TDiNextEst$coverage_based$qTD.LCL, 3) output_TD_abun$TDiNextEst$coverage_based$qTD.UCL <- round(output_TD_abun$TDiNextEst$coverage_based$qTD.UCL, 3) output_TD_abun$TDiNextEst$coverage_based$SC <- round(output_TD_abun$TDiNextEst$coverage_based$SC, 3) output_TD_abun$TDiNextEst$coverage_based$m <- round(output_TD_abun$TDiNextEst$coverage_based$m, 3) output_TD_abun$TDiNextEst$coverage_based[1:6,]
The third list of the output ($TDAsyEst
) includes the name of the Assemblage, diversity label (qTD
, species richness for q = 0, Shannon diversity for q = 1, and Simpson diversity for q = 2), the observed diversity (TD_obs
), asymptotic diversity estimate (TD_asy
) and its estimated bootstrap standard error (s.e.
) as well as the confidence intervals for asymptotic diversity (qTD.LCL
and qTD.UCL
). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D()
. The output for $TDAsyEst
is shown below:
output_TD_abun$TDAsyEst
tmp = output_TD_abun$TDAsyEst tmp[,-(1:2)] = round(tmp[,-(1:2)], 3) tmp
The ggiNEXT3D
function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage"
is specified in the ggiNEXT3D
function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1
) is given below:
# TD sample-size-based R/E curves, separating by "Assemblage" ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Assemblage")
out <- ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
When facet.var = "Order.q"
is specified in the ggiNEXT3D
function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:
# TD sample-size-based R/E curves, separating by "Order.q" ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Order.q")
out <- ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The following commands return the sample completeness (sample coverage) curve (type = 2
) in which different colors represent different assemblages.
# Sample completeness curves for abundance data, separating by "Assemblage" ggiNEXT3D(output_TD_abun, type = 2, color.var = "Assemblage")
out <- ggiNEXT3D(output_TD_abun, type = 2, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"
), or represent two assemblages within each diversity order (facet.var = "Order.q"
), respectively.
# TD coverage-based R/E curves, separating by "Assemblage" ggiNEXT3D(output_TD_abun, type = 3, facet.var = "Assemblage")
out <- ggiNEXT3D(output_TD_abun, type = 3, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
# TD coverage-based R/E curves, separating by "Order.q" ggiNEXT3D(output_TD_abun, type = 3, facet.var = "Order.q")
out <- ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
Based on the dataset (Fish_incidence_data
) included in the package, the following commands return all numerical results for TD
. The first list of the output ($TDInfo
) returns basic data information including the name of the Assemblage, number of sampling units (T
), total number of incidences (U
), observed species richness (S.obs
), sample coverage estimate of the reference sample with size T (SC(T)
), sample coverage estimate of the extrapolated sample with size 2T (SC(2T)
) as well as the first five species incidence frequency counts in the reference sample (Q1-Q5
). The output is identical to that based on the function DataInfo3D()
by specifying diversity = 'TD'
and datatype = "incidence_raw"
; see later text). Thus, if only data information is required, the simpler function DataInfo3D()
(see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D()
, which will be introduced later.
data(Fish_incidence_data) output_TD_inci <- iNEXT3D(Fish_incidence_data, diversity = 'TD', q = c(0, 1, 2), datatype = "incidence_raw") output_TD_inci$TDInfo
output_TD_inci[1]$TDInfo$`SC(T)` <- round(output_TD_inci[1]$TDInfo$`SC(T)`,3) output_TD_inci[1]$TDInfo$`SC(2T)` <- round(output_TD_inci[1]$TDInfo$`SC(2T)`,3) output_TD_inci[1]
The second list of the output ($TDiNextEst
) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "2013-2015"
time period, corresponding to the target number of sample units mT
= 1, 2, 4, ..., 34, 36, 37, 38, ..., 72), which locates the reference sampling units at the mid-point of the selected knots. There are two data frames ($size_based
and $coverage_based
).
The first data frame ($size_based
) includes the name of the Assemblage, diversity order (Order.q
), the target number of sampling units (mT
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the target number of sample units mT
is less than, equal to, or greater than the number of sampling units in the reference sample), the diversity estimate of order q (qTD
), the lower and upper confidence limits of diversity (qTD.LCL
and qTD.UCL
) conditioning on the sample size, and the corresponding sample coverage estimate (SC
) along with the lower and upper confidence limits of sample coverage (SC.LCL
and SC.UCL
). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot
is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based
output are displayed:
output_TD_inci$TDiNextEst$size_based
output_TD_inci$TDiNextEst$size_based$qTD <- round(output_TD_inci$TDiNextEst$size_based$qTD, 3) output_TD_inci$TDiNextEst$size_based$qTD.LCL <- round(output_TD_inci$TDiNextEst$size_based$qTD.LCL, 3) output_TD_inci$TDiNextEst$size_based$qTD.UCL <- round(output_TD_inci$TDiNextEst$size_based$qTD.UCL, 3) output_TD_inci$TDiNextEst$size_based$SC <- round(output_TD_inci$TDiNextEst$size_based$SC, 3) output_TD_inci$TDiNextEst$size_based$SC.LCL <- round(output_TD_inci$TDiNextEst$size_based$SC.LCL, 3) output_TD_inci$TDiNextEst$size_based$SC.UCL <- round(output_TD_inci$TDiNextEst$size_based$SC.UCL, 3) output_TD_inci$TDiNextEst$size_based[1:6,]
The second data frame ($coverage_based
) includes the name of assemblage, the diversity order (Order.q
), the target sample coverage value (SC
), the corresponding number of sampling units (mT
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the coverage SC
is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qTD
), the lower and upper confidence limits of diversity (qTD.LCL
and qTD.UCL
) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based
output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based
table is wider than the corresponding interval in the $size_based
table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)
output_TD_inci$TDiNextEst$coverage_based
output_TD_inci$TDiNextEst$coverage_based$qTD <- round(output_TD_inci$TDiNextEst$coverage_based$qTD, 3) output_TD_inci$TDiNextEst$coverage_based$qTD.LCL <- round(output_TD_inci$TDiNextEst$coverage_based$qTD.LCL, 3) output_TD_inci$TDiNextEst$coverage_based$qTD.UCL <- round(output_TD_inci$TDiNextEst$coverage_based$qTD.UCL, 3) output_TD_inci$TDiNextEst$coverage_based$SC <- round(output_TD_inci$TDiNextEst$coverage_based$SC, 3) output_TD_inci$TDiNextEst$coverage_based$mT <- round(output_TD_inci$TDiNextEst$coverage_based$mT, 3) output_TD_inci$TDiNextEst$coverage_based[1:6,]
The third list of the output ($TDAsyEst
) includes the name of the Assemblage, diversity label (qTD
, species richness for q = 0, Shannon diversity for q = 1, and Simpson diversity for q = 2), the observed diversity (TD_obs
), asymptotic diversity estimate (TD_asy
) and its estimated bootstrap standard error (s.e.
) as well as the confidence intervals for asymptotic diversity (qTD.LCL
and qTD.UCL
). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D()
. The output is shown below:
output_TD_inci$TDAsyEst
tmp = output_TD_inci$TDAsyEst tmp[,-(1:2)] = round(tmp[,-(1:2)], 3) tmp
The ggiNEXT3D
function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage"
is specified in the ggiNEXT3D
function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1
) for incidence data is given below:
# TD sample-size-based R/E curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_TD_inci, type = 1, facet.var = "Assemblage")
out <- ggiNEXT3D(output_TD_inci, type = 1, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
When facet.var = "Order.q"
is specified in the ggiNEXT3D
function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:
# TD sample-size-based R/E curves for incidence data, separating by "Order.q" ggiNEXT3D(output_TD_inci, type = 1, facet.var = "Order.q")
out <- ggiNEXT3D(output_TD_inci, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The following commands return the sample completeness (sample coverage) curve (type = 2
) in which different colors are used for different assemblages.
# Sample completeness curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_TD_inci, type = 2, color.var = "Assemblage")
out <- ggiNEXT3D(output_TD_inci, type = 2, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"
), or represent two assemblages within each diversity order (facet.var = "Order.q"
), respectively.
# TD coverage-based R/E curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_TD_inci, type = 3, facet.var = "Assemblage")
out <- ggiNEXT3D(output_TD_inci, type = 3, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
# TD coverage-based R/E curves for incidence data, separating by "Order.q" ggiNEXT3D(output_TD_inci, type = 3, facet.var = "Order.q")
out <- ggiNEXT3D(output_TD_inci, type = 3, facet.var = "Order.q") out + theme(text = element_text(size = 12))
Based on the dataset (Brazil_rainforest_abun_data
) and the phylogentic tree (Brazil_rainforest_phylo_tree
) included in the package, the following commands return all numerical results for PD
. The first list of the output ($PDInfo
) returns basic data information including the name of the Assemblage, sample size (n
), observed species richness (S.obs
), sample coverage estimate of the reference sample with size n (SC(n)
), sample coverage estimate of the extrapolated sample with size 2n (SC(2n)
), the observed total branch length in the phylogenetic tree spanned by all observed specise (PD.obs
), the number of singletons and doubletons in the node/branch abundance set (f1*,f2*
), the total branch length of those singletons and doubletons in the node/branch abundance set (g1,g2
), and the reference time (Reftime
). The output is identical to that based on the function DataInfo3D()
by specifying diversity = 'PD'
and datatype = "abundance"
; see later text). Thus, if only data information is required, the simpler function DataInfo3D()
(see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D()
, which will be introduced later.
The required argument for performing PD analysis is PDtree
. For example, the phylogenetic tree for all observed species (including species in both Edge and Interior habitats) is stored in Brazil_rainforest_phylo_tree
. Then we enter the argument PDtree = Brazil_rainforest_phylo_tree
. Two optional arguments are: PDtype
and PDreftime
. There are two options for PDtype
: "PD"
(effective total branch length) or "meanPD"
(effective number of equally divergent lineages, meanPD
= PD/tree depth). Default is PDtype = "meanPD"
. PDreftime
is a numerical value specifying a reference time for computing phylogenetic diversity. By default (PDreftime = NULL
), the reference time is set to the tree depth, i.e., age of the root of the phylogenetic tree. Run the following code to perform PD analysis.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree output_PD_abun <- iNEXT3D(data, diversity = 'PD', q = c(0, 1, 2), datatype = "abundance", nboot = 20, PDtree = tree) output_PD_abun$PDInfo
output_PD_abun[1]$PDInfo$`SC(n)` <- round(output_PD_abun[1]$PDInfo$`SC(n)`,3) output_PD_abun[1]$PDInfo$`SC(2n)` <- round(output_PD_abun[1]$PDInfo$`SC(2n)`,3) output_PD_abun[1]
The second list of the output ($PDiNextEst
) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "Edge" assemblage, corresponding to the target sample size m
= 1, 95, 189, ..., 1699, 1794, 1795, 1899, ..., 3588), which locates the reference sample size at the mid-point of the selected knots. There are two data frames ($size_based
and $coverage_based
).
The first data frame ($size_based
) includes the name of the Assemblage, diversity order (Order.q
), the target sample size (m
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the size m
is less than, equal to, or greater than the reference sample size), the diversity estimate of order q (qPD
), the lower and upper confidence limits of diversity (qPD.LCL
and qPD.UCL
) conditioning on the sample size, the corresponding sample coverage estimate (SC
) along with the lower and upper confidence limits of sample coverage (SC.LCL
and SC.UCL
), the reference time (Reftime
) and the type of PD (Type
). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot
is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based
output are displayed:
output_PD_abun$PDiNextEst$size_based
output_PD_abun$PDiNextEst$size_based$qPD <- round(output_PD_abun$PDiNextEst$size_based$qPD, 3) output_PD_abun$PDiNextEst$size_based$qPD.LCL <- round(output_PD_abun$PDiNextEst$size_based$qPD.LCL, 3) output_PD_abun$PDiNextEst$size_based$qPD.UCL <- round(output_PD_abun$PDiNextEst$size_based$qPD.UCL, 3) output_PD_abun$PDiNextEst$size_based$SC <- round(output_PD_abun$PDiNextEst$size_based$SC, 3) output_PD_abun$PDiNextEst$size_based$SC.LCL <- round(output_PD_abun$PDiNextEst$size_based$SC.LCL, 3) output_PD_abun$PDiNextEst$size_based$SC.UCL <- round(output_PD_abun$PDiNextEst$size_based$SC.UCL, 3) output_PD_abun$PDiNextEst$size_based[1:6,]
The second data frame ($coverage_based
) includes the name of assemblage, the diversity order (Order.q
), the target sample coverage value (SC
), the corresponding sample size (m
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the coverage SC
is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qPD
), the lower and upper confidence limits of diversity (qPD.LCL
and qPD.UCL
) conditioning on the target sample coverage value, the reference times (Reftime
) and the type of PD (Type
). Here only the first six rows of the $coverage_based
output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based
table is wider than the corresponding interval in the $size_based
table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)
output_PD_abun$PDiNextEst$coverage_based
output_PD_abun$PDiNextEst$coverage_based$qPD <- round(output_PD_abun$PDiNextEst$coverage_based$qPD, 3) output_PD_abun$PDiNextEst$coverage_based$qPD.LCL <- round(output_PD_abun$PDiNextEst$coverage_based$qPD.LCL, 3) output_PD_abun$PDiNextEst$coverage_based$qPD.UCL <- round(output_PD_abun$PDiNextEst$coverage_based$qPD.UCL, 3) output_PD_abun$PDiNextEst$coverage_based$SC <- round(output_PD_abun$PDiNextEst$coverage_based$SC, 3) output_PD_abun$PDiNextEst$coverage_based$m <- round(output_PD_abun$PDiNextEst$coverage_based$m, 3) output_PD_abun$PDiNextEst$coverage_based[1:6,]
The third list of the output ($PDAsyEst
) includes the name of the Assemblage, PD (or meanPD) for q = 0, 1, and 2 (qPD
), the observed diversity (PD_obs
), asymptotic diversity estimates (PD_asy
), estimated asymptotic bootstrap standard error (s.e.
) as well as the confidence intervals for asymptotic diversity with q = 0, 1, and 2 (qPD.LCL
and qPD.UCL
), the reference times (Reftime
) and the type of PD (Type
). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D()
. The output is shown below:
output_PD_abun$PDAsyEst
tmp = output_PD_abun$PDAsyEst tmp[,-c(1:2,9)] = round(tmp[,-c(1:2,9)], 3) tmp
The ggiNEXT3D
function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage"
is specified in the ggiNEXT3D
function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1
) is given below:
# PD sample-size-based R/E curves, separating by "Assemblage" ggiNEXT3D(output_PD_abun, type = 1, facet.var = "Assemblage")
out <- ggiNEXT3D(output_PD_abun, type = 1, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
When facet.var = "Order.q"
is specified in the ggiNEXT3D
function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:
# PD sample-size-based R/E curves, separating by "Order.q" ggiNEXT3D(output_PD_abun, type = 1, facet.var = "Order.q")
out <- ggiNEXT3D(output_PD_abun, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The following commands return the sample completeness (sample coverage) curve (type = 2
) in which different colors are used for different assemblages.
# Sample completeness curves for abundance data, separating by "Assemblage" ggiNEXT3D(output_PD_abun, type = 2, color.var = "Assemblage")
out <- ggiNEXT3D(output_PD_abun, type = 2, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"
), or represent two assemblages within each diversity order (facet.var = "Order.q"
), respectively.
# PD coverage-based R/E curves, separating by "Assemblage" ggiNEXT3D(output_PD_abun, type = 3, facet.var = "Assemblage")
out <- ggiNEXT3D(output_PD_abun, type = 3, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
# PD coverage-based R/E curves, separating by "Order.q" ggiNEXT3D(output_PD_abun, type = 3, facet.var = "Order.q")
out <- ggiNEXT3D(output_PD_abun, type = 3, facet.var = "Order.q") out + theme(text = element_text(size = 12))
Based on the dataset (Fish_incidence_data
) included in the package and the phylogentic tree (Fish_phylo_tree
), the following commands return all numerical results for PD
. The first list of the output ($PDInfo
) returns basic data information including the name of the Assemblage, number of sampling units (T
), total number of incidences (U
), observed species richness (S.obs
), sample coverage estimate of the reference sample with size T (SC(T)
), sample coverage estimate of the extrapolated sample with size 2T (SC(2T)
), the observed total branch length in the phylogenetic tree spanned by all observed species (PD.obs
), the singletons/doubletons in the sample branch incidence (Q1*,Q2*
), the total branch length of those singletons/doubletons in the sample branch incidence (R1,R2
), and the reference time (Reftime
). The output is identical to that based on the function DataInfo3D()
by specifying diversity = 'PD'
and datatype = "incidence_raw"
; see later text). Thus, if only data information is required, the simpler function DataInfo3D()
(see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D()
, which will be introduced later.
The required argument for performing PD analysis is PDtree
. For example, the phylogenetic tree for all observed species (including species in both "2013-2015"
and "2016-2018"
time periods) is stored in Fish_phylo_tree
. Then we enter the argument PDtree = Fish_phylo_tree
. Two optional arguments are: PDtype
and PDreftime
. There are two options for PDtype
: "PD"
(effective total branch length) or "meanPD"
(effective number of equally divergent lineages, meanPD
= PD/tree depth). Default is PDtype = "meanPD"
. PDreftime
is a numerical value specifying a reference time for computing phylogenetic diversity. By default (PDreftime = NULL
), the reference time is set to the tree depth, i.e., age of the root of the phylogenetic tree. Run the following code to perform PD analysis.
data(Fish_incidence_data) data(Fish_phylo_tree) data <- Fish_incidence_data tree <- Fish_phylo_tree output_PD_inci <- iNEXT3D(data, diversity = 'PD', q = c(0, 1, 2), datatype = "incidence_raw", nboot = 20, PDtree = tree) output_PD_inci$PDInfo
output_PD_inci[1]$PDInfo$`SC(T)` <- round(output_PD_inci[1]$PDInfo$`SC(T)`, 3) output_PD_inci[1]$PDInfo$`SC(2T)` <- round(output_PD_inci[1]$PDInfo$`SC(2T)`, 3) output_PD_inci[1]$PDInfo$R1 <- round(output_PD_inci[1]$PDInfo$R1, 3) output_PD_inci[1]$PDInfo$R2 <- round(output_PD_inci[1]$PDInfo$R2, 3) output_PD_inci[1]$PDInfo$Reftime <- round(output_PD_inci[1]$PDInfo$Reftime, 3) output_PD_inci[1]
The second list of the output ($PDiNextEst
) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "2013-2015"
time period, corresponding to the target number of sample units mT
= 1, 2, 4, ..., 34, 36, 37, 38, ..., 72), which locates the reference sampling units at the mid-point of the selected knots. There are two data frames ($size_based
and $coverage_based
).
The first data frame ($size_based
) includes the name of the Assemblage, diversity order (Order.q
), the target number of sample units (mT
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the target number of sample units mT
is less than, equal to, or greater than the number of sampling units in the reference sample), the diversity estimate of order q (qPD
), the lower and upper confidence limits of diversity (qPD.LCL
and qPD.UCL
) conditioning on the sample size, the corresponding sample coverage estimate (SC
) along with the lower and upper confidence limits of sample coverage (SC.LCL
and SC.UCL
), the reference time (Reftime
) and the type of PD (Type
). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot
is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based
output are displayed:
output_PD_inci$PDiNextEst$size_based
output_PD_inci$PDiNextEst$size_based$qPD <- round(output_PD_inci$PDiNextEst$size_based$qPD, 3) output_PD_inci$PDiNextEst$size_based$qPD.LCL <- round(output_PD_inci$PDiNextEst$size_based$qPD.LCL, 3) output_PD_inci$PDiNextEst$size_based$qPD.UCL <- round(output_PD_inci$PDiNextEst$size_based$qPD.UCL, 3) output_PD_inci$PDiNextEst$size_based$SC <- round(output_PD_inci$PDiNextEst$size_based$SC, 3) output_PD_inci$PDiNextEst$size_based$SC.LCL <- round(output_PD_inci$PDiNextEst$size_based$SC.LCL, 3) output_PD_inci$PDiNextEst$size_based$SC.UCL <- round(output_PD_inci$PDiNextEst$size_based$SC.UCL, 3) output_PD_inci$PDiNextEst$size_based$Reftime <- round(output_PD_inci$PDiNextEst$size_based$Reftime, 3) output_PD_inci$PDiNextEst$size_based[1:6,]
The second data frame ($coverage_based
) includes the name of assemblage, the diversity order (Order.q
), the target sample coverage value (SC
), the corresponding number of sample units (mT
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the coverage SC
is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qPD
), the lower and upper confidence limits of diversity (qPD.LCL
and qPD.UCL
) conditioning on the target sample coverage value, the reference time (Reftime
) and the type of PD (Type
). Here only the first six rows of the $coverage_based
output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based
table is wider than the corresponding interval in the $size_based
table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)
output_PD_inci$PDiNextEst$coverage_based
output_PD_inci$PDiNextEst$coverage_based$qPD <- round(output_PD_inci$PDiNextEst$coverage_based$qPD, 3) output_PD_inci$PDiNextEst$coverage_based$qPD.LCL <- round(output_PD_inci$PDiNextEst$coverage_based$qPD.LCL, 3) output_PD_inci$PDiNextEst$coverage_based$qPD.UCL <- round(output_PD_inci$PDiNextEst$coverage_based$qPD.UCL, 3) output_PD_inci$PDiNextEst$coverage_based$SC <- round(output_PD_inci$PDiNextEst$coverage_based$SC, 3) output_PD_inci$PDiNextEst$coverage_based$mT <- round(output_PD_inci$PDiNextEst$coverage_based$mT, 3) output_PD_inci$PDiNextEst$coverage_based$Reftime <- round(output_PD_inci$PDiNextEst$coverage_based$Reftime, 3) output_PD_inci$PDiNextEst$coverage_based[1:6,]
The third list of the output ($PDAsyEst
) includes the name of the Assemblage, PD (or meanPD) for q = 0, 1, and 2 (qPD
), the observed diversity (PD_obs
), asymptotic diversity estimate (PD_asy
) and its estimated bootstrap standard error (s.e.
), the confidence intervals for asymptotic diversity (qPD.LCL
and qPD.UCL
), the reference time (Reftime
) and the type of PD (Type
) . These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D()
. The output is shown below:
output_PD_inci$PDAsyEst
tmp = output_PD_inci$PDAsyEst tmp[,-c(1:2,9)] = round(tmp[,-c(1:2,9)], 3) tmp
The ggiNEXT3D
function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage"
is specified in the ggiNEXT3D
function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1
) is given below:
# PD sample-size-based R/E curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_PD_inci, type = 1, facet.var = "Assemblage")
out <- ggiNEXT3D(output_PD_inci, type = 1, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
When facet.var = "Order.q"
is specified in the ggiNEXT3D
function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:
# PD sample-size-based R/E curves for incidence data, separating by "Order.q" ggiNEXT3D(output_PD_inci, type = 1, facet.var = "Order.q")
out <- ggiNEXT3D(output_PD_inci, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The following commands return the sample completeness (sample coverage) curve (type = 2
) in which different colors are used for different assemblages.
# Sample completeness curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_PD_inci, type = 2, color.var = "Assemblage")
out <- ggiNEXT3D(output_PD_inci, type = 2, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"
), or represent two assemblages within each diversity order (facet.var = "Order.q"
), respectively.
# PD coverage-based R/E curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_PD_inci, type = 3, facet.var = "Assemblage")
out <- ggiNEXT3D(output_PD_inci, type = 3, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
# PD coverage-based R/E curves for incidence data, separating by "Order.q" ggiNEXT3D(output_PD_inci, type = 3, facet.var = "Order.q")
out <- ggiNEXT3D(output_PD_inci, type = 3, facet.var = "Order.q") out + theme(text = element_text(size = 12))
Based on the dataset (Brazil_rainforest_abun_data
) and the the distance matrix (Brazil_rainforest_distance_matrix
) included in the package, the following commands return all numerical results for FD
. The first list of the output ($FDInfo
) returns basic data information including the name of the Assemblage, sample size (n
), observed species richness (S.obs
), sample coverage estimate of the reference sample with size n (SC(n)
), sample coverage estimate of the extrapolated sample with size 2n (SC(2n)
), and the minimum, mean, and maximum distance among all non-diagonal elements in the distance matrix(dmin, dmean, dmax
). The output is identical to that based on the function DataInfo3D()
by specifying diversity = 'FD'
and datatype = "abundance"
; see later text). Thus, if only data information is required, the simpler function DataInfo3D()
(see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D()
, which will be introduced later.
The required argument for performing FD analysis is FDdistM
. For example, the distance matrix for all species (including species in both Edge and Interior habitats) is stored in Brazil_rainforest_distance_matrix
. Then we enter the argument FDdistM = Brazil_rainforest_distance_matrix
Three optional arguments are (1) FDtype
: FDtype = "AUC"
means FD is computed from the area under the curve of a tau-profile by integrating all plausible threshold values between zero and one; FDtype = "tau_values"
means FD is computed under specific threshold values to be specified in the argument FD_tau
. (2) FD_tau
: a numerical value specifying the tau value (threshold level) that will be used to compute FD. If FDtype = "tau_values"
and FD_tau = NULL
, then the threshold level is set to be the mean distance between any two individuals randomly selected from the pooled data over all data (i.e., quadratic entropy).
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_FD_abun <- iNEXT3D(data, diversity = 'FD', datatype = "abundance", nboot = 10, FDdistM = distM, FDtype = 'AUC') output_FD_abun$FDInfo
output_FD_abun[1]$FDInfo$`SC(n)` <- round(output_FD_abun[1]$FDInfo$`SC(n)`, 3) output_FD_abun[1]$FDInfo$`SC(2n)` <- round(output_FD_abun[1]$FDInfo$`SC(2n)`, 3) output_FD_abun[1]$FDInfo$dmean <- round(output_FD_abun[1]$FDInfo$dmean, 3) output_FD_abun[1]$FDInfo$dmax <- round(output_FD_abun[1]$FDInfo$dmax, 3) output_FD_abun[1]
The second list of the output ($FDiNextEst
) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "Edge" assemblage, corresponding to the target sample size m
= 1, 95, 189, ..., 1699, 1794, 1795, 1899, ..., 3588), which locates the reference sample size at the mid-point of the selected knots. There are two data frames ($size_based
and $coverage_based
).
The first data frame ($size_based
) includes the name of the Assemblage, diversity order (Order.q
), the target sample size (m
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the size m
is less than, equal to, or greater than the reference sample size), the diversity estimate of order q (qFD
), the lower and upper confidence limits of diversity (qFD.LCL
and qFD.UCL
) conditioning on the sample size, and the corresponding sample coverage estimate (SC
) along with the lower and upper confidence limits of sample coverage (SC.LCL
and SC.UCL
). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot
is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based
output are displayed:
output_FD_abun$FDiNextEst$size_based
output_FD_abun$FDiNextEst$size_based$qFD <- round(output_FD_abun$FDiNextEst$size_based$qFD, 3) output_FD_abun$FDiNextEst$size_based$qFD.LCL <- round(output_FD_abun$FDiNextEst$size_based$qFD.LCL, 3) output_FD_abun$FDiNextEst$size_based$qFD.UCL <- round(output_FD_abun$FDiNextEst$size_based$qFD.UCL, 3) output_FD_abun$FDiNextEst$size_based$SC <- round(output_FD_abun$FDiNextEst$size_based$SC, 3) output_FD_abun$FDiNextEst$size_based$SC.LCL <- round(output_FD_abun$FDiNextEst$size_based$SC.LCL, 3) output_FD_abun$FDiNextEst$size_based$SC.UCL <- round(output_FD_abun$FDiNextEst$size_based$SC.UCL, 3) output_FD_abun$FDiNextEst$size_based[1:6,]
The second data frame ($coverage_based
) includes the name of assemblage, the diversity order (Order.q
), the target sample coverage value (SC
), the corresponding sample size (m
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the coverage SC
is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qFD
), and the lower and upper confidence limits of diversity (qFD.LCL
and qFD.UCL
) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based
output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based
table is wider than the corresponding interval in the $size_based
table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)
output_FD_abun$FDiNextEst$coverage_based
output_FD_abun$FDiNextEst$coverage_based$qFD <- round(output_FD_abun$FDiNextEst$coverage_based$qFD, 3) output_FD_abun$FDiNextEst$coverage_based$qFD.LCL <- round(output_FD_abun$FDiNextEst$coverage_based$qFD.LCL, 3) output_FD_abun$FDiNextEst$coverage_based$qFD.UCL <- round(output_FD_abun$FDiNextEst$coverage_based$qFD.UCL, 3) output_FD_abun$FDiNextEst$coverage_based$SC <- round(output_FD_abun$FDiNextEst$coverage_based$SC, 3) output_FD_abun$FDiNextEst$coverage_based$m <- round(output_FD_abun$FDiNextEst$coverage_based$m, 3) output_FD_abun$FDiNextEst$coverage_based[1:6,]
The third list of the output ($FDAsyEst
) includes the name of the Assemblage, FD for q = 0, 1, and 2 (qFD
), the observed diversity (FD_obs
), asymptotic diversity estimate (FD_asy
) and its estimated bootstrap standard error (s.e.
) as well as the confidence intervals for asymptotic diversity (qFD.LCL
and qFD.UCL
). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D()
. The output is shown below:
output_FD_abun$FDAsyEst
tmp = output_FD_abun$FDAsyEst tmp[,-(1:2)] = round(tmp[,-(1:2)], 3) tmp
The ggiNEXT3D
function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage"
is specified in the ggiNEXT3D
function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1
) is given below:
# FD sample-size-based R/E curves, separating by "Assemblage" ggiNEXT3D(output_FD_abun, type = 1, facet.var = "Assemblage")
out <- ggiNEXT3D(output_FD_abun, type = 1, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
When facet.var = "Order.q"
is specified in the ggiNEXT3D
function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:
# FD sample-size-based R/E curves, separating by "Order.q" ggiNEXT3D(output_FD_abun, type = 1, facet.var = "Order.q")
out <- ggiNEXT3D(output_FD_abun, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The following commands return the sample completeness (sample coverage) curve (type = 2
) in which different colors are used for different assemblages.
# Sample completeness curves for abundance data, separating by "Assemblage" ggiNEXT3D(output_FD_abun, type = 2, color.var = "Assemblage")
out <- ggiNEXT3D(output_FD_abun, type = 2, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"
), or represent two assemblages within each diversity order (facet.var = "Order.q"
), respectively.
# FD coverage-based R/E curves, separating by "Assemblage" ggiNEXT3D(output_FD_abun, type = 3, facet.var = "Assemblage")
out <- ggiNEXT3D(output_FD_abun, type = 3, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
# FD coverage-based R/E curves, separating by "Order.q" ggiNEXT3D(output_FD_abun, type = 3, facet.var = "Order.q")
out <- ggiNEXT3D(output_FD_abun, type = 3, facet.var = "Order.q") out + theme(text = element_text(size = 12))
Based on the dataset (Fish_incidence_data
) and the the distance matrix (Fish_distance_matrix
) included in the package, the following commands return all numerical results for FD
. The first list of the output ($FDInfo
) returns basic data information including the name of the Assemblage, number of sampling units (T
), total number of incidences (U
), observed species richness (S.obs
), sample coverage estimate of the reference sample with size T (SC(T)
), sample coverage estimate of the reference sample with size 2T (SC(2T)
), and the minimum, mean, and maximum distance among all non-diagonal elements in the distance matrix(dmin, dmean, dmax
). The output is identical to that based on the function DataInfo3D()
by specifying diversity = 'FD'
and datatype = "incidence_raw"
; see later text). Thus, if only data information is required, the simpler function DataInfo3D()
(see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D()
, which will be introduced later.
The required argument for performing FD analysis is FDdistM
. For example, the distance matrix for all species (including species in both "2013-2015"
and "2016-2018"
time periods) is stored in Fish_distance_matrix
. Then we enter the argument FDdistM = Fish_distance_matrix
Three optional arguments are (1) FDtype
: FDtype = "AUC"
means FD is computed from the area under the curve of a tau-profile by integrating all plausible threshold values between zero and one; FDtype = "tau_values"
means FD is computed under specific threshold values to be specified in the argument FD_tau. (2) FD_tau
: a numerical value specifying the tau value (threshold level) that will be used to compute FD. If FDtype = "tau_values"
and FD_tau = NULL
, then the threshold level is set to be the mean distance between any two individuals randomly selected from the pooled data over all data (i.e., quadratic entropy).
data(Fish_incidence_data) data(Fish_distance_matrix) data <- Fish_incidence_data distM <- Fish_distance_matrix output_FD_inci <- iNEXT3D(data, diversity = 'FD', datatype = "incidence_raw", nboot = 20, FDdistM = distM, FDtype = 'AUC') output_FD_inci$FDInfo
output_FD_inci[1]$FDInfo$`SC(T)` <- round(output_FD_inci[1]$FDInfo$`SC(T)`, 3) output_FD_inci[1]$FDInfo$`SC(2T)` <- round(output_FD_inci[1]$FDInfo$`SC(2T)`, 3) output_FD_inci[1]$FDInfo$S.obs <- round(output_FD_inci[1]$FDInfo$S.obs, 3) output_FD_inci[1]$FDInfo$dmin <- round(output_FD_inci[1]$FDInfo$dmin, 3) output_FD_inci[1]$FDInfo$dmean <- round(output_FD_inci[1]$FDInfo$dmean, 3) output_FD_inci[1]$FDInfo$dmax <- round(output_FD_inci[1]$FDInfo$dmax, 3) output_FD_inci[1]
The second list of the output ($FDiNextEst
) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "2013-2015"
time period, corresponding to the target number of sample units mT
= 1, 2, 4, ..., 34, 36, 37, 38, ..., 72), which locates the reference sampling units at the mid-point of the selected knots. There are two data frames ($size_based
and $coverage_based
).
The first data frame ($size_based
) includes the name of the Assemblage, diversity order (Order.q
), the target number of sample units (mT
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the target number of sample units mT
is less than, equal to, or greater than the number of sampling units in the reference sample), the diversity estimate of order q (qFD
), the lower and upper confidence limits of diversity (qFD.LCL
and qFD.UCL
) conditioning on the sample size, and the corresponding sample coverage estimate (SC
) along with the lower and upper confidence limits of sample coverage (SC.LCL
and SC.UCL
). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot
is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based
output are displayed:
output_FD_inci$FDiNextEst$size_based
output_FD_inci$FDiNextEst$size_based$qFD <- round(output_FD_inci$FDiNextEst$size_based$qFD, 3) output_FD_inci$FDiNextEst$size_based$qFD.LCL <- round(output_FD_inci$FDiNextEst$size_based$qFD.LCL, 3) output_FD_inci$FDiNextEst$size_based$qFD.UCL <- round(output_FD_inci$FDiNextEst$size_based$qFD.UCL, 3) output_FD_inci$FDiNextEst$size_based$SC <- round(output_FD_inci$FDiNextEst$size_based$SC, 3) output_FD_inci$FDiNextEst$size_based$SC.LCL <- round(output_FD_inci$FDiNextEst$size_based$SC.LCL, 3) output_FD_inci$FDiNextEst$size_based$SC.UCL <- round(output_FD_inci$FDiNextEst$size_based$SC.UCL, 3) output_FD_inci$FDiNextEst$size_based[1:6,]
The second data frame ($coverage_based
) includes the name of assemblage, the diversity order (Order.q
), the target sample coverage value (SC
), the corresponding number of sample units (mT
), the Method
(Rarefaction
, Observed
, or Extrapolation
, depending on whether the coverage SC
is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qFD
), and the lower and upper confidence limits of diversity (qFD.LCL
and qFD.UCL
) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based
output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based
table is wider than the corresponding interval in the $size_based
table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)
output_FD_inci$FDiNextEst$coverage_based
output_FD_inci$FDiNextEst$coverage_based$qFD <- round(output_FD_inci$FDiNextEst$coverage_based$qFD, 3) output_FD_inci$FDiNextEst$coverage_based$qFD.LCL <- round(output_FD_inci$FDiNextEst$coverage_based$qFD.LCL, 3) output_FD_inci$FDiNextEst$coverage_based$qFD.UCL <- round(output_FD_inci$FDiNextEst$coverage_based$qFD.UCL, 3) output_FD_inci$FDiNextEst$coverage_based$SC <- round(output_FD_inci$FDiNextEst$coverage_based$SC, 3) output_FD_inci$FDiNextEst$coverage_based$mT <- round(output_FD_inci$FDiNextEst$coverage_based$mT, 3) output_FD_inci$FDiNextEst$coverage_based[1:6,]
The third list of the output ($FDAsyEst
) includes the name of the Assemblage, FD for q = 0, 1, and 2 (qFD
), the observed diversity (FD_obs
), asymptotic diversity estimate (FD_asy
) and its estimated bootstrap standard error (s.e.
), and the confidence intervals for asymptotic diversity (qFD.LCL
and qFD.UCL
). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D()
. The output is shown below:
output_FD_inci$FDAsyEst
tmp = output_FD_inci$FDAsyEst tmp[,-c(1:2)] = round(tmp[,-c(1:2)], 3) tmp
The ggiNEXT3D
function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage"
is specified in the ggiNEXT3D
function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1
) is given below:
# FD sample-size-based R/E curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_FD_inci, type = 1, facet.var = "Assemblage")
out <- ggiNEXT3D(output_FD_inci, type = 1, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
When facet.var = "Order.q"
is specified in the ggiNEXT3D
function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:
# FD sample-size-based R/E curves for incidence data, separating by "Order.q" ggiNEXT3D(output_FD_inci, type = 1, facet.var = "Order.q")
out <- ggiNEXT3D(output_FD_inci, type = 1, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The following commands return the sample completeness (sample coverage) curve (type = 2
) in which different colors are used for different assemblages.
# Sample completeness curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_FD_inci, type = 2, color.var = "Assemblage")
out <- ggiNEXT3D(output_FD_inci, type = 2, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"
), or represent two assemblages within each diversity order (facet.var = "Order.q"
), respectively.
# FD coverage-based R/E curves for incidence data, separating by "Assemblage" ggiNEXT3D(output_FD_inci, type = 3, facet.var = "Assemblage")
out <- ggiNEXT3D(output_FD_inci, type = 3, facet.var = "Assemblage") out + theme(text = element_text(size = 12))
# FD coverage-based R/E curves for incidence data, separating by "Order.q" ggiNEXT3D(output_FD_inci, type = 3, facet.var = "Order.q")
out <- ggiNEXT3D(output_FD_inci, type = 3, facet.var = "Order.q") out + theme(text = element_text(size = 12))
The function DataInfo3D()
provides basic data information for the reference sample in each individual assemblage. The function DataInfo3D()
with default arguments is shown below:
DataInfo3D(data, diversity = "TD", datatype = "abundance", nT = NULL, PDtree, PDreftime = NULL, FDdistM, FDtype = "AUC", FDtau = NULL)
All arguments in the above function are the same as those for the main function iNEXT3D
. Running the DataInfo3D()
function returns basic data information including sample size, observed species richness, two sample coverage estimates (SC(n)
and SC(2n)
) as well as other relevant information in each of the three dimensions of diversity. We use Brazil_rainforest_abun_data and Fish_incidence_data to demo the function for each dimension of diversity.
data(Brazil_rainforest_abun_data) DataInfo3D(Brazil_rainforest_abun_data, diversity = 'TD', datatype = "abundance")
tmp <- DataInfo3D(Brazil_rainforest_abun_data, diversity = 'TD', datatype = "abundance") tmp$`SC(n)` <- round(tmp$`SC(n)`, 3) tmp$`SC(2n)` <- round(tmp$`SC(2n)`, 3) tmp
Output description:
Assemblage
= assemblage name.
n
= number of observed individuals in the reference sample (sample size).
S.obs
= number of observed species in the reference sample.
SC(n)
= sample coverage estimate of the reference sample with size n.
SC(2n)
= sample coverage estimate of the reference sample with size 2n.
f1
-f5
= the first five species abundance frequency counts in the reference sample.
data(Fish_incidence_data) DataInfo3D(Fish_incidence_data, diversity = 'TD', datatype = "incidence_raw")
tmp <- DataInfo3D(Fish_incidence_data, diversity = 'TD', datatype = "incidence_raw") tmp$`SC(T)` <- round(tmp$`SC(T)`, 3) tmp$`SC(2T)` <- round(tmp$`SC(2T)`, 3) tmp
Output description:
Assemblage
= assemblage name.
T
= number of sampling units in the reference sample (sample size for incidence data).
U
= total number of incidences in the reference sample.
S.obs
= number of observed species in the reference sample.
SC(T)
= sample coverage estimate of the reference sample with size T.
SC(2T)
= sample coverage estimate of the reference sample with size 2T.
Q1
-Q5
= the first five species incidence frequency counts in the reference sample.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree DataInfo3D(data, diversity = 'PD', datatype = "abundance", PDtree = tree)
tmp <- DataInfo3D(Brazil_rainforest_abun_data, diversity = 'PD', datatype = "abundance", PDtree = Brazil_rainforest_phylo_tree) tmp$`SC(n)` <- round(tmp$`SC(n)`, 3) tmp$`SC(2n)` <- round(tmp$`SC(2n)`, 3) tmp
Output description:
Assemblage
, n
, S.obs
, SC(n)
and SC(2n)
: definitions are the same as in the TD abundance output and thus are omitted.
PD.obs
= the observed total branch length in the phylogenetic tree spanned by all observed species.
f1*
,f2*
= the number of singletons and doubletons in the node/branch abundance set.
g1
,g2
= the total branch length of those singletons/doubletons in the node/branch abundance set.
Reftime
= reference time for phylogenetic diversity (the age of the root of phylogenetic tree).
data(Fish_incidence_data) data(Fish_phylo_tree) data <- Fish_incidence_data tree <- Fish_phylo_tree DataInfo3D(data, diversity = 'PD', datatype = "incidence_raw", PDtree = tree)
tmp <- DataInfo3D(Fish_incidence_data, diversity = 'PD', datatype = "incidence_raw", PDtree = Fish_phylo_tree) tmp$`SC(T)` <- round(tmp$`SC(T)`, 3) tmp$`SC(2T)` <- round(tmp$`SC(2T)`, 3) tmp$PD.obs <- round(tmp$PD.obs, 3) tmp$R1 <- round(tmp$R1, 3) tmp$R2 <- round(tmp$R2, 3) tmp
Output description:
Assemblage
, T
, U
, S.obs
, SC(T)
and SC(2T)
: definitions are the same as in the TD incidence output and thus are omitted.
PD.obs
= the observed total branch length in the phylogenetic tree spanned by all observed species.
Q1*
,Q2*
= the singletons/doubletons in the sample branch incidence.
R1
,R2
= the total branch length of those singletons/doubletons in the sample branch incidence.
Reftime
= reference time.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix DataInfo3D(data, diversity = 'FD', datatype = "abundance", FDdistM = distM, FDtype = 'AUC')
tmp <- DataInfo3D(Brazil_rainforest_abun_data, diversity = 'FD', datatype = "abundance", FDtype = "AUC", FDdistM = Brazil_rainforest_distance_matrix) tmp$`SC(n)` <- round(tmp$`SC(n)`, 3) tmp$`SC(2n)` <- round(tmp$`SC(2n)`, 3) tmp$dmean <- round(tmp$dmean, 3) tmp$dmax <- round(tmp$dmax, 3) tmp
Output description:
Assemblage
, n
, S.obs
, SC(n)
and SC(2n)
: definitions are the same as in TD abundance output and thus are omitted.
dmin
= the minimum distance among all non-diagonal elements in the distance matrix.
dmean
= the mean distance between any two individuals randomly selected from each assemblage.
dmax
= the maximum distance among all elements in the distance matrix.
data(Fish_incidence_data) data(Fish_distance_matrix) data <- Fish_incidence_data distM <- Fish_distance_matrix DataInfo3D(data, diversity = 'FD', datatype = "incidence_raw", FDdistM = distM, FDtype = 'AUC')
tmp <- DataInfo3D(Fish_incidence_data, diversity = 'FD', datatype = "incidence_raw", FDtype = "AUC", FDdistM = Fish_distance_matrix) tmp$`SC(T)` <- round(tmp$`SC(T)`, 3) tmp$`SC(2T)` <- round(tmp$`SC(2T)`, 3) tmp$dmin <- round(tmp$dmin, 3) tmp$dmean <- round(tmp$dmean, 3) tmp$dmax <- round(tmp$dmax, 3) tmp
Output description:
Assemblage
, T
, U
, S.obs
, SC(T)
and SC(2T)
: definitions are the same as in the TD incidence output and thus are omitted.
dmin
= the minimum distance among all non-diagonal elements in the distance matrix.
dmean
= the mean distance between any two individuals randomly selected from each assemblage.
dmax
= the maximum distance among all elements in the distance matrix.
estimate3D
is used to compute 3D diversity (TD, PD, FD) estimates with q = 0, 1, 2 under any specified levels of sample size (when base = "size"
) and sample coverage values (when base = "coverage"
) for abundance data (datatype = "abundance"
) or incidence data (datatype = "incidence_raw"
). When base = "size"
, level
can be specified with a particular vector of sample sizes (greater than 0); if level = NULL
, this function computes the diversity estimates for the minimum sample size among all samples extrapolated to the double reference sizes. When base = "coverage"
, level
can be specified with a particular vector of sample coverage values (between 0 and 1); if level = NULL
, this function computes the diversity estimates for the minimum sample coverage among all samples extrapolated to the double reference sizes. All arguments in the function are the same as those for the main function iNEXT3D
.
estimate3D(data, diversity = "TD", q = c(0, 1, 2), datatype = "abundance", base = "coverage", level = NULL, nboot = 50, conf = 0.95, nT = NULL, PDtree, PDreftime = NULL, PDtype = "meanPD", FDdistM, FDtype = "AUC", FDtau = NULL, FDcut_number = 50)
The following commands return the TD estimates with two specified levels of sample coverage (93% and 97%) based on the Brazil_rainforest_abun_data
.
data(Brazil_rainforest_abun_data) output_est_TD_abun <- estimate3D(Brazil_rainforest_abun_data, diversity = 'TD', q = c(0,1,2), datatype = "abundance", base = "coverage", level = c(0.93, 0.97)) output_est_TD_abun
output_est_TD_abun <- estimate3D(Brazil_rainforest_abun_data, diversity = 'TD', q = c(0,1,2), datatype = "abundance", base = "coverage", level = c(0.93, 0.97)) output_est_TD_abun[,c('SC', 'm', 'qTD', 's.e.', 'qTD.LCL', 'qTD.UCL')] = round(output_est_TD_abun[,c('SC', 'm', 'qTD', 's.e.', 'qTD.LCL', 'qTD.UCL')], 3) output_est_TD_abun
The following commands return the TD estimates with two specified levels of sample coverage (97.5% and 99%) for the Fish_incidence_data
.
data(Fish_incidence_data) output_est_TD_inci <- estimate3D(Fish_incidence_data, diversity = 'TD', q = c(0, 1, 2), datatype = "incidence_raw", base = "coverage", level = c(0.975, 0.99)) output_est_TD_inci
output_est_TD_inci <- estimate3D(Fish_incidence_data, diversity = 'TD', q = c(0, 1, 2), datatype = "incidence_raw", base = "coverage", level = c(0.975, 0.99)) output_est_TD_inci[,c('SC', 'mT', 'qTD', 's.e.', 'qTD.LCL', 'qTD.UCL')] = round(output_est_TD_inci[,c('SC', 'mT', 'qTD', 's.e.', 'qTD.LCL', 'qTD.UCL')], 3) output_est_TD_inci
The following commands return the PD estimates with two specified levels of sample sizes (1500 and 3500) for the Brazil_rainforest_abun_data
.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree output_est_PD_abun <- estimate3D(data, diversity = 'PD', datatype = "abundance", base = "size", level = c(1500, 3500), PDtree = tree) output_est_PD_abun
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree output_est_PD_abun <- estimate3D(data, diversity = 'PD', datatype = "abundance", base = "size", level = c(1500, 3500), PDtree = tree) output_est_PD_abun[,c('m', 'SC', 'qPD', 's.e.', 'qPD.LCL', 'qPD.UCL')] = round(output_est_PD_abun[,c('m', 'SC', 'qPD', 's.e.', 'qPD.LCL', 'qPD.UCL')], 3) output_est_PD_abun
The following commands return the PD estimates with two specified levels of sample coverage (97.5% and 99%) for the Fish_incidence_data
.
data(Fish_incidence_data) data(Fish_phylo_tree) data <- Fish_incidence_data tree <- Fish_phylo_tree output_est_PD_inci <- estimate3D(data, diversity = 'PD', datatype = "incidence_raw", base = "coverage", level = c(0.975, 0.99), PDtree = tree) output_est_PD_inci
data(Fish_incidence_data) data(Fish_phylo_tree) data <- Fish_incidence_data tree <- Fish_phylo_tree output_est_PD_inci <- estimate3D(data, diversity = 'PD', datatype = "incidence_raw", base = "coverage", level = c(0.975, 0.99), PDtree = tree) output_est_PD_inci[,c('SC', 'mT', 'qPD', 's.e.', 'qPD.LCL', 'qPD.UCL')] = round(output_est_PD_inci[,c('SC', 'mT', 'qPD', 's.e.', 'qPD.LCL', 'qPD.UCL')], 3) output_est_PD_inci
The following commands return the FD estimates with two specified levels of sample coverage (93% and 97%) for the Brazil_rainforest_abun_data
.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_est_FD_abun <- estimate3D(data, diversity = 'FD', datatype = "abundance", base = "coverage", level = c(0.93, 0.97), nboot = 10, FDdistM = distM, FDtype = 'AUC') output_est_FD_abun
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_est_FD_abun <- estimate3D(data, diversity = 'FD', datatype = "abundance", base = "coverage", level = c(0.93, 0.97), nboot = 10, FDdistM = distM, FDtype = 'AUC') output_est_FD_abun[,c('SC', 'm', 'qFD', 's.e.', 'qFD.LCL', 'qFD.UCL')] = round(output_est_FD_abun[,c('SC', 'm', 'qFD', 's.e.', 'qFD.LCL', 'qFD.UCL')], 3) output_est_FD_abun
The following commands return the FD estimates with two specified levels of sample sizes (30 and 70) for the Fish_incidence_data
.
data(Fish_incidence_data) data(Fish_distance_matrix) data <- Fish_incidence_data distM <- Fish_distance_matrix output_est_FD_inci <- estimate3D(data, diversity = 'FD', datatype = "incidence_raw", base = "size", level = c(30, 70), nboot = 10, FDdistM = distM, FDtype = 'AUC') output_est_FD_inci
data(Fish_incidence_data) data(Fish_distance_matrix) data <- Fish_incidence_data distM <- Fish_distance_matrix output_est_FD_inci <- estimate3D(data, diversity = 'FD', datatype = "incidence_raw", base = "size", level = c(30, 70), nboot = 10, FDdistM = distM, FDtype = 'AUC') output_est_FD_inci[,c('mT', 'SC', 'qFD', 's.e.', 'qFD.LCL', 'qFD.UCL')] = round(output_est_FD_inci[,c('mT', 'SC', 'qFD', 's.e.', 'qFD.LCL', 'qFD.UCL')], 3) output_est_FD_inci
ObsAsy3D(data, diversity = "TD", q = seq(0, 2, 0.2), datatype = "abundance", nboot = 50, conf = 0.95, nT = NULL, method = c("Asymptotic", "Observed"), PDtree, PDreftime = NULL, PDtype = "meanPD", FDdistM, FDtype = "AUC", FDtau = NULL, FDcut_number = 50 )
All arguments in the above function are the same as those for the main function iNEXT3D
(except that the default of q
here is seq(0, 2, 0.2)
). The function ObsAsy3D()
computes observed and asymptotic diversity of order q between 0 and 2 (in increments of 0.2) for 3D diversity; these 3D values with different order q can be used to depict a q-profile in the ggObsAsy3D
function.
It also computes observed and asymptotic PD for various reference times by specifying the argument PDreftime
; these PD values with different reference times can be used to depict a time-profile in the ggObsAsy3D
function.
It also computes observed and asymptotic FD for various threshold tau levels by specifying the argument FDtau
; these FD values with different threshold levels can be used to depict a tau-profile in the ggObsAsy3D
function.
For each dimension, by default, both the observed and asymptotic diversity estimates will be computed.
ggObsAsy3D(output, profile = "q")
ggObsAsy3D
is a ggplot2 extension for an ObsAsy3D
object to plot 3D q-profile (which depicts the observed diversity and asymptotic diversity estimate with respect to order q) for q between 0 and 2 (in increments of 0.2).
It also plots time-profile (which depicts the observed and asymptotic estimate of PD or mean PD with respect to reference times when diversity = "PD"
specified in the ObsAsy3D function), and tau-profile (which depicts the observed and asymptotic estimate of FD with respect to threshold level tau when diversity = "FD"
and FDtype = "tau_values"
specified in the ObsAsy3D
function) based on the output from the function ObsAsy3D
.
In the plot of profiles, only confidence intervals of the asymptotic diversity will be shown when both the observed and asymptotic diversity estimates are computed.
The following commands returns the observed and asymptotic taxonomic diversity ('TD') for the Brazil_rainforest_abun_data
, along with its confidence interval for diversity order q between 0 to 2. Here only the first ten rows of the output are shown.
data(Brazil_rainforest_abun_data) output_ObsAsy_TD_abun <- ObsAsy3D(Brazil_rainforest_abun_data, diversity = 'TD', datatype = "abundance") output_ObsAsy_TD_abun
output_ObsAsy_TD_abun <- ObsAsy3D(Brazil_rainforest_abun_data, diversity = 'TD', datatype = "abundance") tmp = output_ObsAsy_TD_abun tmp[,(3:6)] <- round(tmp[,(3:6)], 3) tmp[1:10,]
The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.
# q-profile curves ggObsAsy3D(output_ObsAsy_TD_abun)
The following commands return the observed and asymptotic taxonomic diversity ('TD') estimates for the Fish_incidence_data
, along with its confidence interval for diversity order q between 0 to 2. Here only the first ten rows of the output are shown.
data(Fish_incidence_data) output_ObsAsy_TD_inci <- ObsAsy3D(Fish_incidence_data, diversity = 'TD', datatype = "incidence_raw") output_ObsAsy_TD_inci
output_ObsAsy_TD_inci <- ObsAsy3D(Fish_incidence_data, diversity = 'TD', datatype = "incidence_raw") tmp = output_ObsAsy_TD_inci tmp[,(3:6)] <- round(tmp[,(3:6)], 3) tmp[1:10,]
The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.
# q-profile curves ggObsAsy3D(output_ObsAsy_TD_inci)
The following commands return the observed and asymptotic phylogenetic diversity ('PD') estimates for the Brazil_rainforest_abun_data
, along with its confidence interval for diversity order q = 0, 1, 2 under reference times from 0.01 to 400 (tree height). Here only the first ten rows of the output are shown.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree output_ObsAsy_PD_abun <- ObsAsy3D(data, diversity = 'PD', q = c(0, 1, 2), PDreftime = seq(0.01, 400, length.out = 20), datatype = "abundance", nboot = 20, PDtree = tree) output_ObsAsy_PD_abun
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_phylo_tree) data <- Brazil_rainforest_abun_data tree <- Brazil_rainforest_phylo_tree output_ObsAsy_PD_abun <- ObsAsy3D(data, diversity = 'PD', q = c(0, 1, 2), PDreftime = seq(0.1, 400, length.out = 40), datatype = "abundance", nboot = 20, PDtree = tree) tmp = output_ObsAsy_PD_abun tmp[,c(3:6,8)] <- round(tmp[,c(3:6,8)], 3) tmp[1:10,]
The argument profile = "time"
in the ggObsAsy3D
function creates a separate plot for each diversity order q = 0, 1, and 2 with x-axis being "Reference time". Different assemblages will be represented by different color lines.
# time-profile curves ggObsAsy3D(output_ObsAsy_PD_abun, profile = "time")
The following commands return the observed and asymptotic taxonomic diversity ('PD') estimates for the Fish_incidence_data
, along with its confidence interval for diversity order q between 0 to 2. Here only the first ten rows of the output are shown.
data(Fish_incidence_data) data(Fish_phylo_tree) data <- Fish_incidence_data tree <- Fish_phylo_tree output_ObsAsy_PD_inci <- ObsAsy3D(data, diversity = 'PD', q = seq(0, 2, 0.2), datatype = "incidence_raw", nboot = 20, PDtree = tree, PDreftime = NULL) output_ObsAsy_PD_inci
data(Fish_incidence_data) data(Fish_phylo_tree) data <- Fish_incidence_data tree <- Fish_phylo_tree output_ObsAsy_PD_inci <- ObsAsy3D(data, diversity = 'PD', q = seq(0, 2, 0.2), datatype = "incidence_raw", nboot = 20, PDtree = tree, PDreftime = NULL) tmp = output_ObsAsy_PD_inci tmp[,c(3:6,8)] <- round(tmp[,c(3:6,8)], 3) tmp[1:10,]
The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2, for the default reference time = 0.977 (the tree depth).
# q-profile curves ggObsAsy3D(output_ObsAsy_PD_inci, profile = "q")
The following commands returns observed and asymptotic functional diversity ('FD') for Brazil_rainforest_abun_data
, along with its confidence interval at diversity order q = 0, 1, 2 under tau values from 0 to 1. Here only the first ten rows of the output are shown.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_ObsAsy_FD_abun_tau <- ObsAsy3D(data, diversity = 'FD', q = c(0, 1, 2), datatype = "abundance", nboot = 10, FDdistM = distM, FDtype = 'tau_values', FDtau = seq(0, 1, 0.05)) output_ObsAsy_FD_abun_tau
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_ObsAsy_FD_abun_tau <- ObsAsy3D(data, diversity = 'FD', q = c(0, 1, 2), datatype = "abundance", nboot = 10, FDdistM = distM, FDtype = 'tau_values', FDtau = seq(0, 1, 0.05)) tmp = output_ObsAsy_FD_abun_tau tmp[,(3:6)] <- round(tmp[,(3:6)], 3) tmp[1:10,]
The following commands plot the corresponding tau-profiles, along with its confidence interval for diversity order q = 0, 1, 2.
# tau-profile curves ggObsAsy3D(output_ObsAsy_FD_abun_tau, profile = "tau")
out <- ggObsAsy3D(output_ObsAsy_FD_abun_tau, profile = "tau") out + theme(text = element_text(size = 12))
The following commands returns the observed and asymptotic taxonomic diversity ('FD') for the Brazil_rainforest_abun_data
, along with its confidence interval for diversity order q between 0 to 2 with FDtype = 'AUC'
. Here only the first ten rows of the output are shown.
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_ObsAsy_FD_abun <- ObsAsy3D(data, diversity = 'FD', q = seq(0, 2, 0.5), datatype = "abundance", nboot = 10, FDdistM = distM, FDtype = 'AUC') output_ObsAsy_FD_abun
data(Brazil_rainforest_abun_data) data(Brazil_rainforest_distance_matrix) data <- Brazil_rainforest_abun_data distM <- Brazil_rainforest_distance_matrix output_ObsAsy_FD_abun <- ObsAsy3D(data, diversity = 'FD', q = seq(0, 2, 0.5), datatype = "abundance", nboot = 10, FDdistM = distM, FDtype = 'AUC') tmp = output_ObsAsy_FD_abun tmp[,(3:6)] <- round(tmp[,(3:6)], 3) tmp[1:10,]
The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.
# q-profile curves ggObsAsy3D(output_ObsAsy_FD_abun, profile = "q")
The following commands returns observed and asymptotic functional diversity ('FD') for Fish_incidence_data
, along with its confidence interval at diversity order q from 0 to 2. Here only the first ten rows of the output are shown.
data(Fish_incidence_data) data(Fish_distance_matrix) data <- Fish_incidence_data distM <- Fish_distance_matrix output_ObsAsy_FD_inci <- ObsAsy3D(data, diversity = 'FD', datatype = "incidence_raw", nboot = 20, FDdistM = distM, FDtype = 'AUC') output_ObsAsy_FD_inci
data(Fish_incidence_data) data(Fish_distance_matrix) data <- Fish_incidence_data distM <- Fish_distance_matrix output_ObsAsy_FD_inci <- ObsAsy3D(data, diversity = 'FD', datatype = "incidence_raw", nboot = 20, FDdistM = distM, FDtype = 'AUC') tmp = output_ObsAsy_FD_inci tmp[,(3:6)] <- round(tmp[,(3:6)], 3) tmp[1:10,]
The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.
# q-profile curves ggObsAsy3D(output_ObsAsy_FD_inci, profile = "q")
The iNEXT.3D package is licensed under the GPLv3. To help refine iNEXT.3D
, your comments or feedback would be welcome (please send them to Anne Chao or report an issue on the iNEXT.3D github iNEXT.3D_github.
Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K.-H., Dornelas, M. and Magurran, A. E. (2021). Measuring temporal change in alpha diversity: a framework integrating taxonomic, phylogenetic and functional diversity and the iNEXT.3D standardization. Methods in Ecology and Evolution, 12, 1926-1940.
Hsieh, T. C., Ma, K-H, and Chao, A. (2016). iNEXT: An R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods in Ecology and Evolution, 7, 1451-1456.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.