sampsd: Sampling Simulated Data and Estimation of Multivariate...

View source: R/sampsd.R

sampsdR Documentation

Sampling Simulated Data and Estimation of Multivariate Standard Errors

Description

For each simulated data set, this function performs repeated sampling across a range of effort levels and estimates the corresponding MultSE (pseudo-multivariate standard error) using dissimilarity-based methods.

Usage

sampsd(dat.sim, Par, transformation, method, n, m, k)

Arguments

dat.sim

A list of simulated data sets generated by simdata.

Par

A list of parameters estimated by assempar.

transformation

Mathematical transformation to reduce the influence of dominant species: one of "square root", "fourth root", "Log (X+1)", "P/A", or "none".

method

Dissimilarity metric to use, passed to vegdist (e.g., "bray", "jaccard", "gower").

n

Maximum number of sampling units per site (must be <= total units available).

m

Maximum number of sites to sample per data set (must be <= total number of sites).

k

Number of repetitions of each sampling configuration (samples × sites) for each data set.

Details

For multi-site simulations, the function selects subsets of sites (from 2 to m) and then draws n samples per site using a two-stage sampling method with inclusion probabilities (Tillé, 2006). For single-site simulations, repeated samples of size 2 to n are taken without replacement.

Each sample undergoes the selected transformation and a dissimilarity matrix is computed. MultSE is estimated using:

  • Single site: pseudo-variance, with MultSE = \sqrt(V/n)

  • Multiple sites: mean squares from a PERMANOVA model (residual and site effects)

This procedure is computationally intensive, especially with large k. Start with low values for exploration.

Value

A matrix containing the estimated MultSE values for each simulated data set, sampling effort combination, and repetition. This matrix is used by summary_ssp.

Note

For quick exploratory analysis, use small k. Once optimal sampling effort is explored, rerun with larger k (e.g. 100). Computation time will increase accordingly.

References

Anderson, M. J., & Santana-Garcon, J. (2015). Measures of precision for dissimilarity-based multivariate analysis of ecological communities. Ecology Letters, 18(1), 66–73.

Guerra-Castro, E. J., Cajas, J. C., Simoes, N., Cruz-Motta, J. J., & Mascaro, M. (2021). SSP: An R package to estimate sampling effort in studies of ecological communities. Ecography, 44(4), 561–573. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/ecog.05284")}

Tillé, Y. (2006). Sampling Algorithms. Springer, New York.

See Also

assempar, simdata, summary_ssp, vegdist

Examples

## Single site example
data(micromollusk)
par.mic <- assempar(data = micromollusk, type = "P/A", Sest.method = "average")
sim.mic <- simdata(par.mic, cases = 3, N = 20, sites = 1)
sam.mic <- sampsd(dat.sim = sim.mic, Par = par.mic, transformation = "P/A",
                  method = "jaccard", n = 10, m = 1, k = 3)

## Multiple site example
data(sponges)
par.spo <- assempar(data = sponges, type = "counts", Sest.method = "average")
sim.spo <- simdata(par.spo, cases = 3, N = 20, sites = 3)
sam.spo <- sampsd(dat.sim = sim.spo, Par = par.spo, transformation = "square root",
                  method = "bray", n = 10, m = 3, k = 3)


SSP documentation built on June 8, 2025, 11:41 a.m.