View source: R/disparity_resample.R
| disparity_resample | R Documentation |
Provides a unified interface to obtain resampled (bootstrap or rarefied) estimates of several disparity / morphospace occupation statistics.
disparity_resample(
Data,
group = NULL,
n_resamples = 1000,
statistic = "multivariate_variance",
CI = 0.95,
bootstrap_rarefaction = "bootstrap",
sample_size = NULL
)
Data |
A data frame, matrix, vector, or 3D array. Observations (specimens) must be in rows (if a 3D array is supplied, the third dimension is assumed to index specimens). |
group |
A factor or a vector indicating group membership (will be coerced to factor). If 'NULL' (default) a single analysis is performed on the full dataset. |
n_resamples |
Number of resampling replicates (default 1000). |
statistic |
Character string identifying the statistic to compute. One of '"multivariate_variance"', '"mean_pairwise_euclidean_distance"', '"convex_hull_volume"', '"claramunt_proper_variance"'. Default is '"multivariate_variance"'. Ignored for univariate data (vector input). |
CI |
Desired two-sided confidence interval level (default 0.95) used for percentile confidence intervals. Use CI=1 to display the full range (minimum to maximum) of resampled values. |
bootstrap_rarefaction |
Either '"bootstrap"' (default) for resampling with replacement or '"rarefaction"' for resampling without replacement. |
sample_size |
Either 'NULL' (default), a positive integer indicating the number of rows to sample, or the character '"smallest"' to use the size of the smallest group (all groups resampled to that size). For 'bootstrap_rarefaction=="rarefaction"', sampling is without replacement and this parameter is required (not 'NULL'). For 'bootstrap_rarefaction=="bootstrap"', sampling is with replacement; if 'NULL', uses original group sizes, otherwise uses the specified sample size. If '"smallest"' is supplied but no groups are defined, an error is returned. |
The function allows choosing among the following multivariate statistics:
Multivariate variance (trace of the covariance matrix; sum of variances)
Mean pairwise Euclidean distance
Convex hull volume (n-dimensional)
Claramunt proper variance (Claramunt 2010) based on a linear shrinkage covariance matrix
If the input 'Data' is univariate (i.e., a vector), the analysis defaults to computing the (univariate) variance within each group (the attribute 'statistic' is ignored).
If 'bootstrap_rarefaction=="bootstrap"', the function performs resampling with replacement (i.e., classical bootstrap) by sampling rows of the data matrix / data frame. Optionally, the user can specify a custom sample size via the 'sample_size' argument. This allows comparison of bootstrap confidence intervals at the same sample size (essentially, this is rarefaction sampling with replacement), which can be useful to compare bootstrapped confidence intervals across different groups for statistics which are sensitive to sample size (at the expense of broader than necessary confidence intervals for groups that are larger).
If 'bootstrap_rarefaction=="rarefaction"', the function performs resampling without replacement at the sample size indicated in 'sample_size' (numeric) or, if 'sample_size=="smallest"', at the size of the smallest group (all groups are resampled to that size). Rarefaction requires specifying a valute to the attribute 'sample_size'; an error is returned otherwise.
This function uses the future framework for parallel processing,
requiring packages future and future.apply.
Users should set up their preferred parallelization strategy using
future::plan() before calling this function.
For example:
future::plan(future::sequential) for sequential
processing
future::plan(future::multisession, workers = 4) for
parallel processing with 4 workers (works on all platforms
including Windows)
future::plan(future::multicore, workers = 4) for
forked processes (Unix-like systems)
future::plan(future::cluster, workers =
c("host1", "host2")) for cluster computing
If no plan is set or the future packages are not available, the function will use sequential processing.
A list containing:
Character vector of length 1 with the human-readable name of the statistic used.
A data frame with columns 'group', 'average', 'CI_min', 'CI_max'. One row per group. When CI=1, 'CI_min' and 'CI_max' represent the minimum and maximum of resampled values rather than confidence intervals.
If a single group: numeric vector of length 'n_resamples' with the resampled values. If multiple groups: a named list with one numeric vector (length 'n_resamples') per group.
The CI level used (between 0 and 1). When CI=1, ranges are computed instead of confidence intervals.
The returned object has class "disparity_resample" and comes with associated S3 methods for convenient display and visualization:
print.disparity_resample: Prints a formatted
summary of results including confidence interval overlap
assessment for multiple groups
plot.disparity_resample: Creates a confidence
interval plot using ggplot2
This function automatically uses parallel processing via the future
framework, when the packages future and future.apply are installed.
This is particularly useful for large datasets, large number of
resamples or computationally intensive statistics (e.g., convex
hull volume).
The parallelization strategy is determined by the user's choice of
future plan, providing flexibility across different computing
environments (local multicore, cluster, etc.).
The function performs parallelization at the level of individual
bootstrap/rarefaction replicates within each group. The future plan
should be set up by the user before calling this function using
future::plan() (see examples). If no plan is set or the
future packages are not available, the function will use sequential
processing.
For bootstrap resampling, the average estimate reported is the mean of the bootstrap resampled values (for consistency across all bootstrap scenarios). For rarefaction, because the purpose is to make groups comparable at a common (smaller) sample size, the average estimate reported is the mean of the rarefied resampled values for that group (i.e., the mean across all rarefaction replicates).
'Data' can be a data frame, a matrix, a vector, or a 3D array of landmark coordinates (e.g., p landmarks x k dimensions x n specimens). In the latter case, the array is converted internally to a 2D matrix with specimens in rows and (landmark * dimension) variables in columns.
Because of how the computation works, convex hull volume computation requires the number of observations (specimens) to be (substantially) greater than the number of variables (dimensions). In case of shape or similar, consider using the scores along the first (few/several) principal components. Sometimes errors are thrown due to near-zero components, in this case try reducing the number of principal components used. Examples of use of this statistic with geometric morphometric data include Drake & Klingenberg 2010 (American Naturalist), Fruciano et al. 2012 (Environmental Biology of Fishes) and Fruciano et al. 2014 (Biological Journal of the Linnean Society). Because of the sensitivity of this statistic to outliers, usually rarefaction is preferred to bootstrapping.
"Multivariate variance" is also called "total variance", "Procrustes variance" (in geometric morphometrics) and "sum of univariate variances". Note how the computation here does not divide variance by sample size (other than the normal division performed in the computation of variances).
Drake AG, Klingenberg CP. 2010. Large-scale diversification of skull shape in domestic dogs: disparity and modularity. American Naturalist 175:289-301.
Claramunt S. 2010. Discovering exceptional diversifications at continental scales: the case of the endemic families of Neotropical Suboscine passerines. Evolution 64:2004-2019.
Fruciano C, Tigano C, Ferrito V. 2012. Body shape variation and colour change during growth in a protogynous fish. Environmental Biology of Fishes 94:615-622.
Fruciano C, Pappalardo AM, Tigano C, Ferrito V. 2014. Phylogeographical relationships of Sicilian brown trout and the effects of genetic introgression on morphospace occupation. Biological Journal of the Linnean Society 112:387-398.
Fruciano C, Franchini P, Raffini F, Fan S, Meyer A. 2016. Are sympatrically speciating Midas cichlid fish special? Patterns of morphological and genetic variation in the closely related species Archocentrus centrarchus. Ecology and Evolution 6:4102-4114.
disparity_test,
print.disparity_resample,
plot.disparity_resample
set.seed(123)
# Simulate two groups with different means but same covariance
if (requireNamespace("MASS", quietly = TRUE)) {
X1 = MASS::mvrnorm(20, mu=rep(0, 10), Sigma=diag(10))
X2 = MASS::mvrnorm(35, mu=rep(2, 10), Sigma=diag(10))
Data = rbind(X1, X2)
grp = factor(c(rep("A", nrow(X1)), rep("B", nrow(X2))))
# Sequential processing
# future::plan(future::sequential) # Default sequential processing
# Parallel processing (uncomment to use)
# future::plan(future::multisession, workers = 2) # Use 2 workers
# Bootstrap multivariate variance
boot_res = disparity_resample(
Data, group=grp, n_resamples=200,
statistic="multivariate_variance",
bootstrap_rarefaction="bootstrap"
)
# Direct access to results table
boot_res$results
# Using the print method for formatted output
print(boot_res)
# Using the plot method to visualize results
# plot(boot_res) # Uncomment to create confidence interval plot
# Rarefaction (to the smallest group size) of mean pairwise
# Euclidean distance
rar_res = disparity_resample(
Data, group=grp, n_resamples=200,
statistic="mean_pairwise_euclidean_distance",
bootstrap_rarefaction="rarefaction", sample_size="smallest"
)
# Now simulate a third group with larger variance
X3 = MASS::mvrnorm(15, mu=rep(0, 10), Sigma=diag(10)*1.5)
grp2 = factor(
c(rep("A", nrow(X1)), rep("B", nrow(X2)), rep("C", nrow(X3)))
)
boot_res2 = disparity_resample(
Data=rbind(X1, X2, X3), group=grp2, n_resamples=1000,
statistic="multivariate_variance",
bootstrap_rarefaction="bootstrap"
)
print(boot_res2)
# plot(boot_res2)
# Plot of the obtained (95%) confidence intervals (uncomment to plot)
# Reset to sequential processing when done (optional)
# future::plan(future::sequential)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.