simdata: Simulation of Data Sets
In edlinguerra/SSP: Simulated Sampling Procedure for Community Ecology

Description Usage Arguments Details Value Note Author(s) References See Also Examples

The function simulates data sets (as many as requested) using estimated parameters from the list generated by assempar. The function returns an object of class list that includes all the simulated data to be used by datquality and sampsd.

1	simdata(Par, cases, N, sites)

`Par`	A list of parameters estimated by `assempar`
`cases`	Number of data sets to be simulated
`N`	Total number of samples to be simulated in each site
`sites`	Total number of sites to be simulated in each data set

The presence/absence of each species at each site are simulated with Bernoulli trials and probability of success equals to the empirical frequency of occurrence of each species among sites in the pilot data. For sites with the presence of a particular species, Bernoulli trials are used (with a probability of success equal to the estimated empirical frequency within the sites where it appears), to simulate the distribution of the species at that site. Once created, the P/A matrices are converted to matrices of abundances replacing presences by random values from an adequate statistical distribution and parameters equal to those estimated in the pilot data. Simulations of counts of individuals are generated using Poisson or negative binomial distributions, depending on the degree of aggregation of each species in the pilot data (McArdle & Anderson 2004; Anderson & Walsh 2013). Simulations of continuous variables (i.e. coverage, biomass), are generated using the log-normal distribution. The simulation procedure is repeated to generate as many simulated data matrices as needed.

simulated.data

The function returns an object of class List, that includes all simulated data. This object will be used by sampsd and datquality

This approach is not free from assumptions. Simulations do not consider any environmental constraint, neither co-occurrence structure of species. It is assumed that potential differences in species composition/abundance among samples and sites are mainly due to spatial aggregation of species, as estimated from the pilot data. Hence, any ecological property of the assemblage that was not captured by the pilot data, will not be reflected in the simulated data. Associations among species can be modeled using copulas, as suggested by Anderson et al (2019), which could be included in an upcoming version of SSP.

Edlin Guerra-Castro (edlinguerra@gmail.com), Juan Carlos Cajas, Juan Jose Cruz-Motta, Nuno Simoes and Maite Mascaro (mmm@ciencias.unam.mx).

Anderson, M. J., & Walsh, D. C. I. (2013). PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecological Monographs, 83(4), 557-574.

Anderson, M. J., P. de Valpine, A. Punnett, & Miller, A. E. (2019). A pathway for multivariate analysis of ecological communities using copulas. Ecology and Evolution 9:3276-3294.

Guerra-Castro, E. J., J. C. Cajas, F. N. Dias Marques Simoes, J. J. Cruz-Motta, and M. Mascaro. (2020). SSP: An R package to estimate sampling effort in studies of ecological communities. bioRxiv:2020.2003.2019.996991.

McArdle, B. H., & Anderson, M. J. (2004). Variance heterogeneity, transformations, and models of species abundance: a cautionary tale. Canadian Journal of Fisheries and Aquatic Sciences, 61, 1294-1302.

sampsd, datquality

###To speed up the simulation of these examples, the cases, sites and N were set small.

##Single site: micromollusk from Cayo Nuevo (Yucatan, Mexico)
data(micromollusk)

#Estimation of parameters of pilot data
par.mic<-assempar(data = micromollusk,
                  type= "P/A",
                  Sest.method = "average")

#Simulation of 3 data sets, each one with 10 potential sampling units from a single site
sim.mic<-simdata(par.mic, cases = 3, N = 10, sites = 1)

##Multiple sites: Sponges from Alacranes National Park (Yucatan, Mexico).
data(sponges)

#Estimation of parameters of pilot data
par.spo<-assempar (data = sponges,
                    type= "counts",
                    Sest.method = "average")

#Simulation of 3 data sets, each one with 10 potential sampling units in 3 sites.
sim.spo<-simdata(par.spo, cases = 3, N = 10, sites = 3)