In ahasverus/SampleSim: Sample Size Effects in Stable Isotope Mixing Solutions

knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.width  = 7,
  fig.height = 7,
  out.width  = "100%",
  dpi        = 300
)

library("samplesim")
library("ggplot2")

This package allows the investigation of the effect of sample size on estimates and precision of stable isotope mixing solutions calculated with the package siar or the package MixSIAR. Samples sizes are modified assuming a normal distribution with a user defined mean and standard deviation. Samples of different sizes are created from this distribution, and mixing proportions are estimated for several replicates of each sample size using the function siar::siarmcmcdirchletv4() or the function MixSIAR::run_model().

Data description

Let's take a look at the data requirements. samplesim needs three types of data:

the isotopic plasma values of the consumers;
the isotopic plasma values of the preys (also called sources);
the trophic discrimination factors.

Datasets provided in this package come from Killengreen et al. (2011)[^1]

[^1]: Killengreen S., Lecomte N., Ehrich D., Schott T., Yoccoz N.G. and Ims R.A. (2011) The importance of marine vs. human-induced subsidies in the maintenance of an expanding mesocarnivore in the arctic tundra. Journal of Animal Ecology, 80, 1049-1060.

To use these datasets with samplesim we will use the functions MixSIAR::load_mix_data(), MixSIAR::load_source_data() and MixSIAR::load_discr_data() to import and format the data.

Consumers data {#consumers}

This dataset contains the isotopic plasma values of the consumers over two (or more) isotopes. It must be structured in a data frame with two (or more) variables. Each column contains the isotopic plasma values of one isotope. In the example provided, this dataset contains two columns:

d13C, describing the plasma values of the d13C isotope
d15N, describing the plasma values of the d15N isotope

Each row of the data frame corresponds to one consumer. Let's take a look at the dataset available in the samplesim package by importing data with the MixSIAR::load_mix_data() function.

## Get the location of the dataset ----
file_path <- system.file("extdata", "foxes_consumer.csv", package = "samplesim")

## Import isotopic plasma values of the consumers ----
consumer <- MixSIAR::load_mix_data(filename     = file_path,
                                   iso_names    = c("d13C", "d15N"),
                                   factors      = NULL,
                                   fac_random   = NULL,
                                   fac_nested   = NULL,
                                   cont_effects = NULL)

Important: currently samplesim is not develop to deal with random factors or continuous covariate.

Let's explore this dataset.

## Get the class of the object ----
class(consumer)

## Name of the list elements ----
names(consumer)

## Print the ten first rows of the isotopic values ----
head(consumer$"data_iso", 10)

## Get the name of the isotopes ----
consumer$"iso_names"

## How many consumers are there in the dataset? ----
consumer$"N"

For further details on the function MixSIAR::load_mix_data() the reader is referred to the help page of this function.

Sources data {#sources}

This dataset contains the isotopic plasma values of the sources over two (or more) isotopes. It must be structured in one of two following formats:

Format #1

A data frame with repeated measurements of plasma isotope (columns) for each source individual (row). In the example provided in samplesim, this dataset contains three columns:

Source: the name of the sources;
d13C: the plasma values of the d13C isotope;
d15N: the plasma values of the d15N isotope.

Format #2

A data frame expressing isotopic plasma values as means and standard deviation. This dataset also contains the sample size of each source. In the example provided in samplesim, this dataset contains six columns:

Sources: the name of the sources;
Meand13C: the mean values of the d13C isotope;
SDd13C: the standard deviation of the d13C isotope;
Meand15N: the mean values of the d15N isotope;
SDd15N: the standard deviation of the d15N isotope;
n: the sample size of each source.

Let's take a look at the dataset available in the samplesim package by importing data with the function MixSIAR::load_source_data().

## Get the location of the dataset ----
file_path <- system.file("extdata", "foxes_sources.csv", package = "samplesim")

## Import mean isotopic plasma values of the sources (Format #1) ----
sources <- MixSIAR::load_source_data(filename       = file_path,
                                     source_factors = NULL,
                                     conc_dep       = FALSE,
                                     data_type      = "means",
                                     mix            = consumer)

Note: If your data follow the format #2, replace the value of the argument data_type by raw.

Let's explore this dataset.

## Get the class of the object ----
class(sources)

## Name of the list elements ----
names(sources)

## Print the mean values of isotopic plasma values ----
sources$"S_MU"

## Print the SD values of isotopic plasma values ----
sources$"S_SIG"

## Get the name of the sources ----
sources$"source_names"

## How many sources are there in the dataset? ----
sources$"n_array"

For further details on the function MixSIAR::load_source_data() the reader is referred to the help page of this function.

Discrimation data {#discrim}

This dataset contains the trophic discrimination factor (TDF) of two (or more) isotopes for each source. It must be structured following the format #2 of the sources dataset. In the example provided in samplesim, this dataset contains five columns:

Sources: the name of the sources;
Meand13C: the mean values of the TDF for the d13C isotope;
SDd13C: the standard deviation of the TDF for the d13C isotope;
Meand15N: the mean values of the TDF for the d15N isotope;
SDd15N: the standard deviation of the TDF for the d15N isotope.

Let's take a look at the dataset available in the samplesim package by importing data with the function MixSIAR::load_discr_data().

## Get the location of the dataset ----
file_path <- system.file("extdata", "foxes_discrimination.csv", package = "samplesim")

## Import TDF values ----
discr <- MixSIAR::load_discr_data(filename = file_path, mix = consumer)

Let's explore this dataset.

## Get the class of the object ----
class(discr)

## Name of the list elements ----
names(discr)

## Print the mean values of TDF ----
discr$"mu"

## Print the SD values of TDF ----
discr$"sig2"

For further details on the function MixSIAR::load_discr_data() the reader is referred to the help page of this function.

Data visualization

The function samplesim::plot_isospace() represents consumer and sources data in the isotopic space. This function is an adaptation of MixSIAR::plot_data() function and add color to source data. If you want you can export plot in PDF and/or PNG format.

samplesim::plot_isospace(mix = consumer, source = sources, discr = discr)

If you prefer you can also used the MixSIAR::plot_data().

If your data make sense, let's move forward!

Running samplesim

The core function of the package is samplesim::samplesim(). It allows investigating the effect of sample size on estimates and precision of stable isotope mixing solutions. More specifically samplesim::samplesim() assesses the sensitivity of isotopes mixing models to variation in numbers of samples from source tissues. This tool can be used prior to full-blown studies in a similar manner than power analyses. It used the function siar::siarmcmcdirichletv4() developed by Andrew Parnell and available in the package siar. samplesim can also be used with the package MixSIAR developed by Brian Stock et al.. User can choose to sample one particular source, or all the sources. User can also choose to modify consumer data. Sample sizes are modified assuming a normal distribution with a user defined mean and standard deviation. Samples of different sizes are created from this distribution, and mixing proportions are estimated for several replicates of each sample size.

The general writing of samplesim::samplesim() is:

samplesim::samplesim(package     = "siar",
                     mix         = consumer,
                     source      = sources,
                     discr       = discr,
                     type        = NULL,
                     nsamples    = NULL,
                     modify      = NULL,
                     nrep        = 100,
                     interval    = 90,
                     name        = NULL,
                     resid_err   = TRUE,
                     process_err = FALSE,
                     run         = "test",
                     alpha.prior = 1,
                     path        = ".")

with:

package, the package name to be used to estimate mixing proportions. Must be one of 'siar' or 'mixsiar'.
mix, the output returned by the MixSIAR::load_mix_data() function and containing consumer isotope values (see Consumers data section).
source, the output returned by the MixSIAR::load_source_data() function and containing mean and standard deviation isotope values of sources (and in some case raw values; see Preys data section).
discr, the output returned by the MixSIAR::load_discr_data() function and containing TDF values (see Discrimation data section).
type, the type of analysis to be run. Must be one of 'one source', 'all sources' or 'consumer'.
nsamples, a vector with the sample sizes to simulate.
modify, the name of the source to modify (case sensitive). This argument has to be specified when the argument type = 'one source'. Otherwise it will be ignored.
nrep, an integer specifying the number of replicates for each sample sizes. Default is 100.
interval, an integer indicating the width of credible interval to use for precision estimation. Default is 90.
name, a character string giving the name of the simulation. If NULL the simulation will be named by the time of the simulation. This name will be used to create a directory (in path) in which results will be stored.
resid_err, a boolean indicating if residual error is included in the model. See ?MixSIAR::run_model for further information. Only necessary if package = 'mixsiar'.
process_err, a boolean indicating if process error is included in the model. See ?MixSIAR::run_model for further information. Only necessary if package = 'mixsiar'.
run, a string or a list specifying MCMC parameters. See ?MixSIAR::run_model for further information. Only necessary if package = 'mixsiar'.
alpha.prior, a numeric giving the Dirichlet prior on p.global. See ?MixSIAR::run_model for further information. Only necessary if package = 'mixsiar'.
path, the directory in which the directory name will be created.

Let's take an example. We will assess the impact of the sample size of one source: the source #6 (Voles) with several sample sizes (from 2 to 500). The analysis will be repeated 999 times.

## samplesim run for one source ----
samplesim::samplesim(package     = "siar",
                     mix         = consumer,
                     source      = sources,
                     discr       = discr,
                     type        = "one source",
                     nsamples    = c(2:10, 15, 25, 50, 75, 100, 150, 250, 500),
                     modify      = "Voles",
                     nrep        = 999,
                     interval    = 90,
                     name        = "test_siar",
                     resid_err   = TRUE,
                     process_err = FALSE,
                     run         = "test",
                     alpha.prior = 1,
                     path        = ".")

This function does not return any object in the R console. Instead, results are stored in the directory ./test_siar/ and they will need to be imported for results visualization (see below).

Note 1: if you want to estimate sample size impacts of all sources, you need to set type = "all source". If you want to assess the impact of the sample size of consumers, you have to set type = "consumer".

Note 2: the use of samplesim with the package MixSIAR requires the installation of the software JAGS. See the MixSIAR documentation for further details.

Results extraction {#results}

The samplesim::samplesim() function has stored four objects results.

## Objects created by samplesim ----
list.files("./test_siar")

list.files("docs/test_siar")

intervals, a four dimensions array with the upper and lower bounds of the credible interval for each sample size, replicate and source. First dimension represents lower and upper bounds; second dimension corresponds to the number of sources; third dimension is the number of replicates; and fourth dimension is the number of sample size.
widths, a three dimensions array with the width (precision) of credible intervals for each source, each replicate and each sample size. First dimension corresponds to the number of replicates; second dimension is the number of sources; and third dimension represents the number of sample size.
medians, a three dimensions array with the median (estimate) of credible intervals for each source, each replicate and each sample size. Dimensions are the same as for widths object.
datasets, a four dimensions array with all resampled datasets.

A logfile.txt is also written and contains all parameters of the simulation. Here's what it looks like.

cat(paste0(readLines("docs/test_siar/logfile.txt"), collapse = "\n"))

Now, let's take a look at the results by importing the medians dataset using the R function readRDS().

## Import medians of credible intervals ----
medians <- readRDS("./test_siar/medians.rds")

# Import medians of credible intervals
medians <- readRDS("docs/test_siar/medians.rds")

## Structure of the object ----
class(medians)

## Names of the dimensions ----
names(dimnames(medians))

## Names of the content of the second dimension ----
dimnames(medians)[[2]]

## Names of the content of the third dimension ----
dimnames(medians)[[3]]

These data are structured in a three dimensions array.

## Extract results of the first replicate ----
medians[1, , ]

## Compute mean over replicates ----
apply(medians, 3:2, mean)

The structure of the widths dataset is the same as for medians.

If you are uncomfortable to deal with three dimensions arrays, you can use the function samplesim::get_output(). This function converts these two arrays (i.e. medians and widths) into an unique data frame. For instance:

## Import medians and widths of credible intervals ----
datas <- samplesim::get_output("test_siar")

datas <- samplesim::get_output("test_siar", path = "docs")

## Structure of the data frame ----
str(datas)

## Print the first ten and last ten rows ----
rbind(head(datas, 10), tail(datas, 10))

If you want to select only the widths data,

## Extract widths of credible intervals ----
widths <- datas[datas$type == "Width of credible intervals", ]

## Check ----
rbind(head(widths, 10), tail(widths, 10))

The argument change of the function samplesim::get_output() allows to compute the percentage of change of medians and widths respectively, based on a reference value of sample size (argument reference; default is the minimum sample size). For instance,

## Import medians and widths of credible intervals expressed as percentage of change ----
datas <- samplesim::get_output(name = "test_siar", change = TRUE, reference = 2)

datas <- samplesim::get_output(name = "test_siar", change = TRUE, reference = 2, path = "docs")

## Structure of the data frame ----
str(datas)

## Print the first ten and last ten rows ----
rbind(head(datas, 10), tail(datas, 10))

Here we notice that the column replicate is omitted and percentages of change are aggregated over replicates.

Results visualization

The samplesim package also offers the possibility of visualizing the effect of sample size on the width of the credible interval and on the median of the posterior distribution of the mixing models. This is possible with the function samplesim::plot_samplesim().

## Visualize results ----
samplesim::plot_samplesim(name = "test_siar")

# Visualize results
samplesim::plot_samplesim(name = "test_siar", path = "docs")

If you prefer you can represent the percentages of change:

# Visualize results expressed as percentages of change
samplesim::plot_samplesim(name = "test_siar", change = TRUE, reference = 2)

samplesim::plot_samplesim(name = "test_siar", change = TRUE, reference = 2, path = "docs")

Finally, if you are not satisfied with the quality of the graph, you can customize it by 1) importing data and 2) programming your own graph. Let's take an example with the ggplot2 package (used in samplesim).

## Import medians and widths of credible intervals ----
datas <- get_output("test_siar")

datas <- get_output("test_siar", path = "docs")

ggplot(aes(x = size, y = value), data = datas) +
  geom_boxplot(aes(fill = source), width = 0.8, outlier.shape = NA) +
  labs(x = "Sample size", y = "Values", fill = "Sources") +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  facet_grid(. ~ type)

Or your can only represent the medians values. For instance (with percentages of change):

## Import medians and widths of credible intervals ----
datas <- samplesim::get_output(name = "test_siar", change = TRUE)

datas <- samplesim::get_output(name = "test_siar", change = TRUE, path = "docs")

## Select only medians of credible intervals ----
medians <- datas[datas$"type" == "Median of posterior distribution", ]

ggplot(aes(x = size, y = value, group = source), data = medians) +
  geom_point(aes(color = source)) +
  geom_line(aes(color = source)) +
  labs(x = "Sample size", y = "Change in medians (%)", color = "Sources") +
  theme_light() +
  theme(legend.position = "bottom", legend.title = element_blank())

Reference {-}

Lecomte N., Ehrich D., Casajus N., Berteaux D., Cameron C., and Yoccoz N.G. How many is enough? An R package for evaluating the effect of sample size on estimates and precision of stable isotope mixing solutions. Submitted to Methods in Ecology and Evolution.

ahasverus/SampleSim documentation built on July 7, 2021, 1:17 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com