sim_eDNA_lmer: Simulate eDNA data
In artemis: Analysis and Simulation of Environmental DNA Experiments

Description Usage Arguments Details Value Diagnosing "unrealistic" simulations Author(s) Examples

Simulate eDNA data

sim_eDNA_lm(
  formula,
  variable_list,
  betas,
  sigma_ln_eDNA,
  std_curve_alpha,
  std_curve_beta,
  n_sim = 1L,
  upper_Cq = 40,
  prob_zero = 0.08,
  X = expand.grid(variable_list),
  verbose = FALSE
)

sim_eDNA_lmer(
  formula,
  variable_list,
  betas,
  sigma_ln_eDNA,
  sigma_rand,
  std_curve_alpha,
  std_curve_beta,
  n_sim = 1L,
  upper_Cq = 40,
  prob_zero = 0.08,
  X = expand.grid(variable_list),
  verbose = FALSE
)

`formula`	a model formula, e.g. `y ~ x1 + x2`. For `sim_eDNA_lmer`, random intercepts can also be provided, e.g. `( 1 \| rep )` .
`variable_list`	a named list, with the levels that each variable can take. Please note that the variables listed in the formula, including the response variable, must be present in the variable_list or in the X design matrix. Extra variables, i.e. variables which do not occur in the formula, are ignored.
`betas`	numeric vector, the beta for each variable in the design matrix
`sigma_ln_eDNA`	numeric, the measurement error on ln[eDNA].
`std_curve_alpha`	the alpha value for the formula for converting between log(eDNA concentration) and CQ value
`std_curve_beta`	the beta value for the formula for converting between log(eDNA concentration) and CQ value
`n_sim`	integer, the number of cases to simulate
`upper_Cq`	numeric, the upper limit on CQ detection. Any value of log(concentration) which would result in a value greater than this limit is instead recorded as the limit.
`prob_zero`	numeric, between 0 and 1. The probability of seeing a non-detection (i.e., a "zero") via the zero-inflated mechanism. Defaults to 0.08.
`X`	optional, a design matrix. By default, this is created from the variable_list using `expand.grid()`, which creates a balanced design matrix. However, the user can provide their own `X` as well, in which case the variable_list is ignored. This allows users to provide an unbalanced design matrix.
`verbose`	logical, when TRUE output from `rstan::sampling` is written to the console.
`sigma_rand`	numeric vector, the stdev for the random effects. There must be one sigma per random effect specified

These functions allow for computationally efficient simulation of Cq values from a hypothetical eDNA sampling experiment via a series of effect sizes (betas) on a number of predictor or variable levels (variable_levels). The mechanism for this model is described in detail in the artemis "Getting Started" vignette.

The simulation functions call to specialized functions which are written in Stan and are compiled to provide speed. This also allows the simulation functions and the modeling functions to reflect the same process at the code level.

S4 object of class "eDNA_simulation_lm/lmer" with the following slots:

ln_conc matrix: the simulated log(concentration)
Cq_star matrix: the simulated CQ values, including the measurement error
formula: the formula for the simulation
variable_levels: named list, the variable levels used for the simulation
betas: numeric vector, the betas for the simulation
x: data.frame, the design matrix
std_curve_alpha numeric: the alpha for the std curve conversion
std_curve_beta numeric: the alpha for the std curve conversion
upper_Cq: the upper limit for CQ

Users will find that sometimes the simulationed response (i.e. Cq values) produced by this function are not similar to expected data collected from a sampling experiment. This circumstance suggests that there is a mismatch between the assumptions of the model and the data generating process in the field. For these circumstances, we suggest:

Check that the betas provided are the effect sizes on the predictor on the log[eDNA concentration], and not the Cq values.
Check that the variable levels provided are representative of real-world circumstances. For example, a sample volume of 0 ml is not possible.
Verify the values for the standard curve alpha and beta. These are specific to each calibration for the lab, so it is important that you use the same conversion between Cq values and log[eDNA concentration] as the comparison data.

Matt Espe

## Includes extra variables
vars = list(Intercept = -10.6,
            distance = c(0, 15, 50),
            volume = c(25, 50),
            biomass = 100,
            alive = 1,
            tech_rep = 1:10,
            rep = 1:3, Cq = 1)

## Intercept only
ans = sim_eDNA_lm(Cq ~ 1, vars,
                      betas = c(intercept = -15),
                      sigma_ln_eDNA = 1e-5,
                      std_curve_alpha = 21.2, std_curve_beta = -1.5)

print(ans)

ans = sim_eDNA_lm(Cq ~ distance + volume, vars,
                  betas = c(intercept = -10.6, distance = -0.05, volume = 0.1),
                  sigma_ln_eDNA = 1, std_curve_alpha = 21.2, std_curve_beta = -1.5)