In ku-awdc/eggSim: Simulating Datasets Representing Egg Reduction Rate Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

eggSim package

The eggSim pacakge is intended to facilitate simulation-based assessment of survey design strategies for testing anthelmintic efficacy via faecal egg count reduction (FECRT) tests. The software is currently a work in progress, and comments/suggestions/contributions are very welcome to the package maintainers!

The package is designed to work in a tidyverse-like way, so loading the tidyverse meta-package along with the eggSim package is a good idea:

library("eggSim")
library("tidyverse")

Parameter values

Before running a simulation we need to decide on the parameters to use. For example, we could use a single community with a mean count of 3.16 EPG:

community_mean <- 3.16

And an expected efficacy of 80%:

reduction <- 1 - 0.8

Then the coefficients of variation at the four available levels need to be chosen:

# Between individuals:
cv_between <- 1.5
# Within an individual over different days:
cv_within <- 0.75
# Within the same individual and the same day:
cv_slide <- 0.25
# Variation in efficacy between individuals:
cv_reduction <- 0

The overall results are highly dependent on these values, but unfortunately it is currently difficult to decide on reasonable values to use due to the absence of applicable data. But we can assess the overall over-dispersion (which is easier to estimate based on available data) using the following function:

combined_k(cv_between, cv_within, cv_slide, cv_reduction)

Finally the budget and relative cost of a second slide must be chosen:

budget <- 1200
second_slide_cost <- 0.621

Running a simulation

The main interface for running simulations is via the eggSim function. We can run 100 simulated FECRT as follows:

efficacies <- eggSim(R=1e2, summarise=FALSE, reduction=reduction, budget=budget, second_slide_cost=second_slide_cost, community_mean=community_mean, cv_between=cv_between, cv_within=cv_within, cv_slide=cv_slide, cv_reduction=cv_reduction)

This gives us a data frame of 1000 observed arithmetic (and geometric) mean efficacies for each of the default survey designs, along with other parameter values that do not vary between iterations:

str(efficacies)

We can summarise these using e.g. a boxplot:

ggplot(efficacies, aes(x=Design, y=ArithmeticEfficacy)) +
  geom_boxplot()

Running multiple simulations

To facilitate looking at multiple scenarios, the expected efficacy (reduction), community mean, budget, and second slide cost (for relevant design types) can be vectorised on input to eggSim. For example we can look at vectorised reduction and community mean values:

community_mean <- c(3.16, 23.5)
reduction <- 1 - c(0.95, 0.8)

And then run the simulations on 2 cores to improve speed of computation:

biases <- eggSim(R=1e2, summarise=TRUE, parallelise=2, reduction=reduction, budget=budget, second_slide_cost=second_slide_cost, community_mean=community_mean, cv_between=cv_between, cv_within=cv_within, cv_slide=cv_slide, cv_reduction=cv_reduction)

This time we asked eggSim to summarise the iterations, so we end up with a smaller data frame with summary statistics over the 1000 iterations per parameter value set. We can extract the most relevant columns like so:

biases %>%
  select(OverallMean, ObsPrev, Target, Design, MeanN, Bias, MedianBias, Variance) %>%
  arrange(OverallMean, Target, Design) %>%
  print(n=Inf)

Obviously the relationship between mean, efficacy and bias/variance of the estimator is easiest to visualise graphically - we are working on an autoplot method for the package to facilitate doing this.

Contributing

We would highly value any contributions (either comments, suggestions or code patches) on this package! You can contribute by either working with the GitHub repository or by emailing the package maintainers using the email addresses available from: