knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.width  = 7,
  fig.height = 7,
  out.width  = "100%",
  dpi        = 300
)
library("samplesim")
library("ggplot2")

This package allows the investigation of the effect of sample size on estimates and precision of stable isotope mixing solutions calculated with the package siar or the package MixSIAR. Samples sizes are modified assuming a normal distribution with a user defined mean and standard deviation. Samples of different sizes are created from this distribution, and mixing proportions are estimated for several replicates of each sample size using the function siar::siarmcmcdirchletv4() or the function MixSIAR::run_model().

Data description

Let's take a look at the data requirements. samplesim needs three types of data:

Datasets provided in this package come from Killengreen et al. (2011)[^1]

[^1]: Killengreen S., Lecomte N., Ehrich D., Schott T., Yoccoz N.G. and Ims R.A. (2011) The importance of marine vs. human-induced subsidies in the maintenance of an expanding mesocarnivore in the arctic tundra. Journal of Animal Ecology, 80, 1049-1060.

To use these datasets with samplesim we will use the functions MixSIAR::load_mix_data(), MixSIAR::load_source_data() and MixSIAR::load_discr_data() to import and format the data.

Consumers data {#consumers}

This dataset contains the isotopic plasma values of the consumers over two (or more) isotopes. It must be structured in a data frame with two (or more) variables. Each column contains the isotopic plasma values of one isotope. In the example provided, this dataset contains two columns:

Each row of the data frame corresponds to one consumer. Let's take a look at the dataset available in the samplesim package by importing data with the MixSIAR::load_mix_data() function.

## Get the location of the dataset ----
file_path <- system.file("extdata", "foxes_consumer.csv", package = "samplesim")

## Import isotopic plasma values of the consumers ----
consumer <- MixSIAR::load_mix_data(filename     = file_path,
                                   iso_names    = c("d13C", "d15N"),
                                   factors      = NULL,
                                   fac_random   = NULL,
                                   fac_nested   = NULL,
                                   cont_effects = NULL)

Important: currently samplesim is not develop to deal with random factors or continuous covariate.


Let's explore this dataset.

## Get the class of the object ----
class(consumer)

## Name of the list elements ----
names(consumer)

## Print the ten first rows of the isotopic values ----
head(consumer$"data_iso", 10)

## Get the name of the isotopes ----
consumer$"iso_names"

## How many consumers are there in the dataset? ----
consumer$"N"


For further details on the function MixSIAR::load_mix_data() the reader is referred to the help page of this function.

Sources data {#sources}

This dataset contains the isotopic plasma values of the sources over two (or more) isotopes. It must be structured in one of two following formats:

Format #1

A data frame with repeated measurements of plasma isotope (columns) for each source individual (row). In the example provided in samplesim, this dataset contains three columns:

Format #2

A data frame expressing isotopic plasma values as means and standard deviation. This dataset also contains the sample size of each source. In the example provided in samplesim, this dataset contains six columns:


Let's take a look at the dataset available in the samplesim package by importing data with the function MixSIAR::load_source_data().

## Get the location of the dataset ----
file_path <- system.file("extdata", "foxes_sources.csv", package = "samplesim")

## Import mean isotopic plasma values of the sources (Format #1) ----
sources <- MixSIAR::load_source_data(filename       = file_path,
                                     source_factors = NULL,
                                     conc_dep       = FALSE,
                                     data_type      = "means",
                                     mix            = consumer)

Note: If your data follow the format #2, replace the value of the argument data_type by raw.


Let's explore this dataset.

## Get the class of the object ----
class(sources)

## Name of the list elements ----
names(sources)

## Print the mean values of isotopic plasma values ----
sources$"S_MU"

## Print the SD values of isotopic plasma values ----
sources$"S_SIG"

## Get the name of the sources ----
sources$"source_names"

## How many sources are there in the dataset? ----
sources$"n_array"

For further details on the function MixSIAR::load_source_data() the reader is referred to the help page of this function.

Discrimation data {#discrim}

This dataset contains the trophic discrimination factor (TDF) of two (or more) isotopes for each source. It must be structured following the format #2 of the sources dataset. In the example provided in samplesim, this dataset contains five columns:

Let's take a look at the dataset available in the samplesim package by importing data with the function MixSIAR::load_discr_data().

## Get the location of the dataset ----
file_path <- system.file("extdata", "foxes_discrimination.csv", package = "samplesim")

## Import TDF values ----
discr <- MixSIAR::load_discr_data(filename = file_path, mix = consumer)


Let's explore this dataset.

## Get the class of the object ----
class(discr)

## Name of the list elements ----
names(discr)

## Print the mean values of TDF ----
discr$"mu"

## Print the SD values of TDF ----
discr$"sig2"

For further details on the function MixSIAR::load_discr_data() the reader is referred to the help page of this function.

Data visualization

The function samplesim::plot_isospace() represents consumer and sources data in the isotopic space. This function is an adaptation of MixSIAR::plot_data() function and add color to source data. If you want you can export plot in PDF and/or PNG format.

samplesim::plot_isospace(mix = consumer, source = sources, discr = discr)

If you prefer you can also used the MixSIAR::plot_data().

If your data make sense, let's move forward!

Running samplesim

The core function of the package is samplesim::samplesim(). It allows investigating the effect of sample size on estimates and precision of stable isotope mixing solutions. More specifically samplesim::samplesim() assesses the sensitivity of isotopes mixing models to variation in numbers of samples from source tissues. This tool can be used prior to full-blown studies in a similar manner than power analyses. It used the function siar::siarmcmcdirichletv4() developed by Andrew Parnell and available in the package siar. samplesim can also be used with the package MixSIAR developed by Brian Stock et al.. User can choose to sample one particular source, or all the sources. User can also choose to modify consumer data. Sample sizes are modified assuming a normal distribution with a user defined mean and standard deviation. Samples of different sizes are created from this distribution, and mixing proportions are estimated for several replicates of each sample size.


The general writing of samplesim::samplesim() is:

samplesim::samplesim(package     = "siar",
                     mix         = consumer,
                     source      = sources,
                     discr       = discr,
                     type        = NULL,
                     nsamples    = NULL,
                     modify      = NULL,
                     nrep        = 100,
                     interval    = 90,
                     name        = NULL,
                     resid_err   = TRUE,
                     process_err = FALSE,
                     run         = "test",
                     alpha.prior = 1,
                     path        = ".")

with:

Let's take an example. We will assess the impact of the sample size of one source: the source #6 (Voles) with several sample sizes (from 2 to 500). The analysis will be repeated 999 times.

## samplesim run for one source ----
samplesim::samplesim(package     = "siar",
                     mix         = consumer,
                     source      = sources,
                     discr       = discr,
                     type        = "one source",
                     nsamples    = c(2:10, 15, 25, 50, 75, 100, 150, 250, 500),
                     modify      = "Voles",
                     nrep        = 999,
                     interval    = 90,
                     name        = "test_siar",
                     resid_err   = TRUE,
                     process_err = FALSE,
                     run         = "test",
                     alpha.prior = 1,
                     path        = ".")

This function does not return any object in the R console. Instead, results are stored in the directory ./test_siar/ and they will need to be imported for results visualization (see below).

Note 1: if you want to estimate sample size impacts of all sources, you need to set type = "all source". If you want to assess the impact of the sample size of consumers, you have to set type = "consumer".

Note 2: the use of samplesim with the package MixSIAR requires the installation of the software JAGS. See the MixSIAR documentation for further details.

Results extraction {#results}

The samplesim::samplesim() function has stored four objects results.

## Objects created by samplesim ----
list.files("./test_siar")
list.files("docs/test_siar")

A logfile.txt is also written and contains all parameters of the simulation. Here's what it looks like.

cat(paste0(readLines("docs/test_siar/logfile.txt"), collapse = "\n"))


Now, let's take a look at the results by importing the medians dataset using the R function readRDS().

## Import medians of credible intervals ----
medians <- readRDS("./test_siar/medians.rds")
# Import medians of credible intervals
medians <- readRDS("docs/test_siar/medians.rds")
## Structure of the object ----
class(medians)

## Names of the dimensions ----
names(dimnames(medians))

## Names of the content of the second dimension ----
dimnames(medians)[[2]]

## Names of the content of the third dimension ----
dimnames(medians)[[3]]

These data are structured in a three dimensions array.

## Extract results of the first replicate ----
medians[1, , ]

## Compute mean over replicates ----
apply(medians, 3:2, mean)

The structure of the widths dataset is the same as for medians.


If you are uncomfortable to deal with three dimensions arrays, you can use the function samplesim::get_output(). This function converts these two arrays (i.e. medians and widths) into an unique data frame. For instance:

## Import medians and widths of credible intervals ----
datas <- samplesim::get_output("test_siar")
datas <- samplesim::get_output("test_siar", path = "docs")
## Structure of the data frame ----
str(datas)

## Print the first ten and last ten rows ----
rbind(head(datas, 10), tail(datas, 10))

If you want to select only the widths data,

## Extract widths of credible intervals ----
widths <- datas[datas$type == "Width of credible intervals", ]

## Check ----
rbind(head(widths, 10), tail(widths, 10))

The argument change of the function samplesim::get_output() allows to compute the percentage of change of medians and widths respectively, based on a reference value of sample size (argument reference; default is the minimum sample size). For instance,

## Import medians and widths of credible intervals expressed as percentage of change ----
datas <- samplesim::get_output(name = "test_siar", change = TRUE, reference = 2)
datas <- samplesim::get_output(name = "test_siar", change = TRUE, reference = 2, path = "docs")
## Structure of the data frame ----
str(datas)

## Print the first ten and last ten rows ----
rbind(head(datas, 10), tail(datas, 10))

Here we notice that the column replicate is omitted and percentages of change are aggregated over replicates.

Results visualization

The samplesim package also offers the possibility of visualizing the effect of sample size on the width of the credible interval and on the median of the posterior distribution of the mixing models. This is possible with the function samplesim::plot_samplesim().

## Visualize results ----
samplesim::plot_samplesim(name = "test_siar")
# Visualize results
samplesim::plot_samplesim(name = "test_siar", path = "docs")


If you prefer you can represent the percentages of change:

# Visualize results expressed as percentages of change
samplesim::plot_samplesim(name = "test_siar", change = TRUE, reference = 2)
samplesim::plot_samplesim(name = "test_siar", change = TRUE, reference = 2, path = "docs")


Finally, if you are not satisfied with the quality of the graph, you can customize it by 1) importing data and 2) programming your own graph. Let's take an example with the ggplot2 package (used in samplesim).

## Import medians and widths of credible intervals ----
datas <- get_output("test_siar")
datas <- get_output("test_siar", path = "docs")
ggplot(aes(x = size, y = value), data = datas) +
  geom_boxplot(aes(fill = source), width = 0.8, outlier.shape = NA) +
  labs(x = "Sample size", y = "Values", fill = "Sources") +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  facet_grid(. ~ type)


Or your can only represent the medians values. For instance (with percentages of change):

## Import medians and widths of credible intervals ----
datas <- samplesim::get_output(name = "test_siar", change = TRUE)
datas <- samplesim::get_output(name = "test_siar", change = TRUE, path = "docs")
## Select only medians of credible intervals ----
medians <- datas[datas$"type" == "Median of posterior distribution", ]

ggplot(aes(x = size, y = value, group = source), data = medians) +
  geom_point(aes(color = source)) +
  geom_line(aes(color = source)) +
  labs(x = "Sample size", y = "Change in medians (%)", color = "Sources") +
  theme_light() +
  theme(legend.position = "bottom", legend.title = element_blank())

Reference {-}

Lecomte N., Ehrich D., Casajus N., Berteaux D., Cameron C., and Yoccoz N.G. How many is enough? An R package for evaluating the effect of sample size on estimates and precision of stable isotope mixing solutions. Submitted to Methods in Ecology and Evolution.



ahasverus/samplesim documentation built on July 8, 2021, 11:03 a.m.