subsample: Subsampling wrapper function
In divDyn: Diversity Dynamics using Fossil Sampling Data

subsample

R Documentation

Subsampling wrapper function

Description

The function will take a function that has an occurrence dataset as an argument, and reruns it iteratively on the subsets of the dataset.

Usage

subsample(
  x,
  q,
  tax = NULL,
  bin = NULL,
  FUN = divDyn,
  coll = NULL,
  iter = 50,
  type = "cr",
  keep = NULL,
  rem = NULL,
  duplicates = TRUE,
  output = "arit",
  useFailed = FALSE,
  FUN.args = NULL,
  na.rm = FALSE,
  counter = TRUE,
  ...
)

Arguments

`x`	(`data.frame`): Occurrence dataset, with `bin`, `tax` and `coll` as column names.
`q`	(`numeric)`: Subsampling level argument (mandatory). Depends on the subsampling function, it is the number of occurrences for `"cr"`, and the number of desired occurrences to the power of `xexp` for O^x^W. It is also the quorum of the SQS method.
`tax`	(`character`): The name of the taxon variable.
`bin`	(`character`): The name of the subsetting variable (has to be integer). For time series, this is the time-slice variable. Rows with `NA` entries in this column will be omitted.
`FUN`	(`function`): The function to be iteratively executed on the results of the subsampling trials. If set to `NULL`, no function will be executed, and the subsampled datasets will be returned as a `list`. By default set to the `divDyn` function. The function must have an argument called `x`, that represents the dataset resulting from a subsampling trial (or the entire dataset). Arguments of the `subsample` function call will be searched for potential arguments of this function, which means that already provided variables (e.g. `bin` and `tax`) will also be used. You can also provide additional arguments (similarly to the `apply` iterator). Functions that allow arguments to pass through (that have argument '...') are not allowed, as well as functions that have the same arguments as `subsample` but would require different values.
`coll`	(`character`): The variable name of the collection identifiers.
`iter`	(`numeric`): The number of iterations to be executed.
`type`	(`character`): The type of subsampling to be implemented. By default this is classical rarefaction (`"cr"`). (`"oxw"`) stands for occurrence weighted by-list subsampling. If set to (`"sqs"`), the program will execute the shareholder quorum subsampling algorithm as it was suggested by Alroy (2010). Setting the argument to `"none"` will invoke no subsamling, but the applied function will be iterated on the trials, nevertheless.
`keep`	(`numeric`): The bins which will not be subsampled but will be added to the subsampling trials. If the number of occurrences does not reach the subsampling quota, by default it will not be represented in the subsampling trials. You can force their inclusion with the `keep` argument separetely (for all, see the `useFailed` argument).
`rem`	(`numeric`): The bins, which will be removed from the dataset before the subsampling trials.
`duplicates`	(`logical` ): Toggles whether multiple entries from the same taxon (`"tax"`) and collection (`"coll"`) variables should be omitted. Useful for omitting occurrences of multiple species-level occurrences of the same genus. By default these are allowed through analyses (`duplicates=TRUE`), setting this to `FALSE` will require you to provide a collection variable. (`coll`)
`output`	(`character`): If the function output are vectors or matrices, the `"arit"` and `"geom"` values will trigger simple averaging with arithmetic or geometric means. If the function output of a single trial is again a `vector` or a `matrix`, setting the output to `"dist"` will return the calculated results of every trial, organized in a `list` of independent variables (e.g. if the function output is value, the return will contain a single `vector`, if it is a `vector`, the output will be a list of `vector`s, if the function output is a `data.frame`, the output will be a `list` of `matrix` class objects). If `output="list"`, the structure of the original function output will be retained, and the results of the individual trials will be concatenated to a `list`.
`useFailed`	(`logical`): If the bin does not reach the subsampling quota, should the bin be used?
`FUN.args`	(`list`): Arguments passed to the applied function `FUN` but not used by the subsampling wrapper. Normally, the arguments of `FUN` can be added to the call of `subsample`, but in case you want to use different values for the same argument, then the arguments added here will be used for the call of `FUN`. For instance, if you want to call `subsample` with `bin=NULL`, but want to run `FUN=divDyn` with a valid `bin` column then you can add the column name here, e.g. `FUN.args=list(bin="stg")`.
`na.rm`	(`logical`): The function call includes more column names that might contain missing values. If this flag is set to `TRUE`, all rows will be dropped that have missig values in the specificed columns. This might lead to the exclusion of some data you do not want to exclude.
`counter`	(`logical`): Should the loop counting be visible?
`...`	arguments passed to `FUN` and the type-specific subsampling functions: `subtrialCR`, `subtrialOXW`, `subtrialSQS`

Details

The subsample function implements the iterative framework of the sampling standardization procedure. The function 1. takes the dataset x, 2. runs function FUN on the dataset and creates a container for results of trials 3. runs one of the subsampling trial functions (e.g. subtrialCR) to get a subsampled 'trial dataset' 4. runs FUN on the trial dataset and 5. averages the results of the trials for a simple output of step 4. such as vectors, matrices and data.frames. For averaging, the vectors and matrices have to have the same output dimensions in the subsampling, as in the original object. For data.frames, the bin-specific information have to be in rows and the bin numbers have to be given in a variable bin in the output of FUN. For a detailed treatment on what the function does, please see the vignette ('Handout to the R package 'divDyn' v0.5.0 for diversity dynamics from fossil occurrence data'). Currently the Classical Rarefaction ("cr", Raup, 1975), the occurrence weighted by-list subsampling ("oxw", Alroy et al., 2001) and the Shareholder Quorum Subsampling methods are implemented ("sqs", Alroy, 2010).

References:

Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.

Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910

Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135

Value

Either a list of replicates or an object matching the class of FUN.

Examples


data(corals)
data(stages)
# Example 1-calculate metrics of diversity dynamics
  dd <- divDyn(corals, tax="genus", bin="stg")
  rarefDD<-subsample(corals,iter=30, q=50,
  tax="genus", bin="stg", output="dist", keep=95)
	
# plotting
  tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), 
  ylab="range-through diversity (genera)", ylim=c(0,230))
  lines(stages$mid, dd$divRT, lwd=2)
  shades(stages$mid, rarefDD$divRT, col="blue")
  legend("topleft", legend=c("raw","rarefaction"),
    col=c("black", "blue"), lwd=c(2,2), bg="white")
  

# Example 2-SIB diversity 
# draft a simple function to calculate SIB diversity
sib<-function(x, bin, tax){
  calc<-tapply(INDEX=x[,bin], X=x[,tax], function(y){
    length(levels(factor(y)))
  })
  return(calc[as.character(stages$stg)])
}
sibDiv<-sib(corals, bin="stg", tax="genus")

# calculate it with subsampling
rarefSIB<-subsample(corals,iter=25, q=50,
  tax="genus", bin="stg", output="arit", keep=95, FUN=sib)
rarefDD<-subsample(corals,iter=25, q=50,
  tax="genus", bin="stg", output="arit", keep=95)

# plot
tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), 
  ylab="SIB diversity (genera)", ylim=c(0,230))

lines(stages$mid, rarefDD$divSIB, lwd=2, col="black")
lines(stages$mid, rarefSIB, lwd=2, col="blue")


# Example 3 - different subsampling types with default function (divDyn)
# compare different subsampling types
  # classical rarefaction
  cr<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", output="dist", keep=95)
  # by-list subsampling (unweighted) - 3 collections
  UW<-subsample(corals,iter=25, q=3,tax="genus", bin="stg", coll="collection_no", 
    output="dist", keep=95, type="oxw", xexp=0)
  # occurrence weighted by list subsampling
  OW<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", coll="collection_no", 
    output="dist", keep=95, type="oxw", xexp=1)
 
  SQS<-subsample(corals,iter=25, q=0.4,tax="genus", bin="stg", output="dist", keep=95, type="sqs")

# plot
  tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), 
  ylab="range-through diversity (genera)", ylim=c(0,100))
  shades(stages$mid, cr$divRT, col="red")
  shades(stages$mid, UW$divRT, col="blue")
  shades(stages$mid, OW$divRT, col="green")
  shades(stages$mid, SQS$divRT, col="cyan")
  
  legend("topleft", bg="white", legend=c("CR (20)", "UW (3)", "OW (20)", "SQS (0.4)"), 
    col=c("red", "blue", "green", "cyan"), lty=c(1,1,1,1), lwd=c(2,2,2,2))

divDyn documentation built on April 3, 2025, 5:57 p.m.