knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7, out.width = "100%", dpi = 300 )
library("samplesim") library("ggplot2")
This package allows the investigation of the effect of sample size on estimates
and precision of stable isotope mixing solutions calculated with the package
siar or the
package MixSIAR.
Samples sizes are modified assuming a normal distribution with a user defined
mean and standard deviation. Samples of different sizes are created from this
distribution, and mixing proportions are estimated for several replicates of
each sample size using the function siar::siarmcmcdirchletv4()
or the
function MixSIAR::run_model()
.
Let's take a look at the data requirements. samplesim
needs three types of data:
Datasets provided in this package come from Killengreen et al. (2011)[^1]
[^1]: Killengreen S., Lecomte N., Ehrich D., Schott T., Yoccoz N.G. and Ims R.A. (2011) The importance of marine vs. human-induced subsidies in the maintenance of an expanding mesocarnivore in the arctic tundra. Journal of Animal Ecology, 80, 1049-1060.
To use these datasets with samplesim
we will use the functions
MixSIAR::load_mix_data()
, MixSIAR::load_source_data()
and
MixSIAR::load_discr_data()
to import and format the data.
This dataset contains the isotopic plasma values of the consumers over two (or more) isotopes. It must be structured in a data frame with two (or more) variables. Each column contains the isotopic plasma values of one isotope. In the example provided, this dataset contains two columns:
d13C
, describing the plasma values of the d13C isotoped15N
, describing the plasma values of the d15N isotopeEach row of the data frame corresponds to one consumer. Let's take a look at the
dataset available in the samplesim
package by importing data with the
MixSIAR::load_mix_data()
function.
## Get the location of the dataset ---- file_path <- system.file("extdata", "foxes_consumer.csv", package = "samplesim") ## Import isotopic plasma values of the consumers ---- consumer <- MixSIAR::load_mix_data(filename = file_path, iso_names = c("d13C", "d15N"), factors = NULL, fac_random = NULL, fac_nested = NULL, cont_effects = NULL)
Important: currently samplesim
is not develop to deal with random factors
or continuous covariate.
Let's explore this dataset.
## Get the class of the object ---- class(consumer) ## Name of the list elements ---- names(consumer) ## Print the ten first rows of the isotopic values ---- head(consumer$"data_iso", 10) ## Get the name of the isotopes ---- consumer$"iso_names" ## How many consumers are there in the dataset? ---- consumer$"N"
For further details on the function MixSIAR::load_mix_data()
the reader is
referred to the help page of this function.
This dataset contains the isotopic plasma values of the sources over two (or more) isotopes. It must be structured in one of two following formats:
Format #1
A data frame with repeated measurements of plasma isotope (columns) for each
source individual (row). In the example provided in samplesim
, this dataset
contains three columns:
Source
: the name of the sources;d13C
: the plasma values of the d13C isotope;d15N
: the plasma values of the d15N isotope.Format #2
A data frame expressing isotopic plasma values as means and standard deviation.
This dataset also contains the sample size of each source. In the example
provided in samplesim
, this dataset contains six columns:
Sources
: the name of the sources;Meand13C
: the mean values of the d13C isotope;SDd13C
: the standard deviation of the d13C isotope;Meand15N
: the mean values of the d15N isotope;SDd15N
: the standard deviation of the d15N isotope;n
: the sample size of each source.Let's take a look at the dataset available in the samplesim
package by
importing data with the function MixSIAR::load_source_data()
.
## Get the location of the dataset ---- file_path <- system.file("extdata", "foxes_sources.csv", package = "samplesim") ## Import mean isotopic plasma values of the sources (Format #1) ---- sources <- MixSIAR::load_source_data(filename = file_path, source_factors = NULL, conc_dep = FALSE, data_type = "means", mix = consumer)
Note: If your data follow the format #2, replace the value of the argument
data_type
by raw
.
Let's explore this dataset.
## Get the class of the object ---- class(sources) ## Name of the list elements ---- names(sources) ## Print the mean values of isotopic plasma values ---- sources$"S_MU" ## Print the SD values of isotopic plasma values ---- sources$"S_SIG" ## Get the name of the sources ---- sources$"source_names" ## How many sources are there in the dataset? ---- sources$"n_array"
For further details on the function MixSIAR::load_source_data()
the reader is
referred to the help page of this function.
This dataset contains the trophic discrimination factor (TDF) of two (or more)
isotopes for each source. It must be structured following the format #2 of the
sources dataset. In the example provided in samplesim
, this dataset contains
five columns:
Sources
: the name of the sources;Meand13C
: the mean values of the TDF for the d13C isotope;SDd13C
: the standard deviation of the TDF for the d13C isotope;Meand15N
: the mean values of the TDF for the d15N isotope;SDd15N
: the standard deviation of the TDF for the d15N isotope.Let's take a look at the dataset available in the samplesim
package by
importing data with the function MixSIAR::load_discr_data()
.
## Get the location of the dataset ---- file_path <- system.file("extdata", "foxes_discrimination.csv", package = "samplesim") ## Import TDF values ---- discr <- MixSIAR::load_discr_data(filename = file_path, mix = consumer)
Let's explore this dataset.
## Get the class of the object ---- class(discr) ## Name of the list elements ---- names(discr) ## Print the mean values of TDF ---- discr$"mu" ## Print the SD values of TDF ---- discr$"sig2"
For further details on the function MixSIAR::load_discr_data()
the reader is
referred to the help page of this function.
The function samplesim::plot_isospace()
represents consumer and sources data
in the isotopic space. This function is an adaptation of MixSIAR::plot_data()
function and add color to source data. If you want you can export plot in PDF
and/or PNG format.
samplesim::plot_isospace(mix = consumer, source = sources, discr = discr)
If you prefer you can also used the MixSIAR::plot_data()
.
If your data make sense, let's move forward!
The core function of the package is samplesim::samplesim()
. It allows
investigating the effect of sample size on estimates and precision of stable
isotope mixing solutions. More specifically samplesim::samplesim()
assesses
the sensitivity of isotopes mixing models to variation in numbers of samples
from source tissues. This tool can be used prior to full-blown studies in a
similar manner than power analyses. It used the function
siar::siarmcmcdirichletv4()
developed by Andrew Parnell and available in the
package siar. samplesim
can also be used with the package MixSIAR
developed by Brian Stock et al.. User can choose to sample one particular
source, or all the sources. User can also choose to modify consumer data.
Sample sizes are modified assuming a normal distribution with a user defined
mean and standard deviation. Samples of different sizes are created from this
distribution, and mixing proportions are estimated for several replicates of
each sample size.
The general writing of samplesim::samplesim()
is:
samplesim::samplesim(package = "siar", mix = consumer, source = sources, discr = discr, type = NULL, nsamples = NULL, modify = NULL, nrep = 100, interval = 90, name = NULL, resid_err = TRUE, process_err = FALSE, run = "test", alpha.prior = 1, path = ".")
with:
package
, the package name to be used to estimate mixing proportions.
Must be one of 'siar'
or 'mixsiar'
.mix
, the output returned by the MixSIAR::load_mix_data()
function and
containing consumer isotope values (see Consumers data section).source
, the output returned by the MixSIAR::load_source_data()
function
and containing mean and standard deviation isotope values of sources (and in
some case raw values; see Preys data section).discr
, the output returned by the MixSIAR::load_discr_data()
function and
containing TDF values (see Discrimation data section).type
, the type of analysis to be run. Must be one of 'one source'
,
'all sources'
or 'consumer'
.nsamples
, a vector with the sample sizes to simulate.modify
, the name of the source to modify (case sensitive). This argument
has to be specified when the argument type = 'one source'
. Otherwise
it will be ignored.nrep
, an integer specifying the number of replicates for each sample sizes.
Default is 100
.interval
, an integer indicating the width of credible interval to use for
precision estimation. Default is 90
.name
, a character string giving the name of the simulation. If NULL
the
simulation will be named by the time of the simulation. This name will be used
to create a directory (in path
) in which results will be stored.resid_err
, a boolean indicating if residual error is included in the model.
See ?MixSIAR::run_model
for further information.
Only necessary if package = 'mixsiar'
.process_err
, a boolean indicating if process error is included in the model.
See ?MixSIAR::run_model
for further information.
Only necessary if package = 'mixsiar'
.run
, a string or a list specifying MCMC parameters. See ?MixSIAR::run_model
for further information. Only necessary if package = 'mixsiar'
.alpha.prior
, a numeric giving the Dirichlet prior on p.global. See
?MixSIAR::run_model
for further information. Only necessary
if package = 'mixsiar'
.path
, the directory in which the directory name
will be created.Let's take an example. We will assess the impact of the sample size of one source: the source #6 (Voles) with several sample sizes (from 2 to 500). The analysis will be repeated 999 times.
## samplesim run for one source ---- samplesim::samplesim(package = "siar", mix = consumer, source = sources, discr = discr, type = "one source", nsamples = c(2:10, 15, 25, 50, 75, 100, 150, 250, 500), modify = "Voles", nrep = 999, interval = 90, name = "test_siar", resid_err = TRUE, process_err = FALSE, run = "test", alpha.prior = 1, path = ".")
This function does not return any object in the R console. Instead, results are
stored in the directory ./test_siar/
and they will need to be imported for
results visualization (see below).
Note 1: if you want to estimate sample size impacts of all sources, you
need to set type = "all source"
. If you want to assess the impact of the
sample size of consumers, you have to set type = "consumer"
.
Note 2: the use of samplesim
with the package MixSIAR
requires the
installation of the software JAGS. See the MixSIAR
documentation for
further details.
The samplesim::samplesim()
function has stored four objects results.
## Objects created by samplesim ---- list.files("./test_siar")
list.files("docs/test_siar")
intervals
, a four dimensions array with the upper and lower bounds of the
credible interval for each sample size, replicate and source. First dimension
represents lower and upper bounds; second dimension corresponds to the number
of sources; third dimension is the number of replicates; and fourth dimension
is the number of sample size.
widths
, a three dimensions array with the width (precision) of credible
intervals for each source, each replicate and each sample size. First dimension
corresponds to the number of replicates; second dimension is the number of
sources; and third dimension represents the number of sample size.
medians
, a three dimensions array with the median (estimate) of credible
intervals for each source, each replicate and each sample size. Dimensions are
the same as for widths
object.
datasets
, a four dimensions array with all resampled datasets.
A logfile.txt
is also written and contains all parameters of the simulation.
Here's what it looks like.
cat(paste0(readLines("docs/test_siar/logfile.txt"), collapse = "\n"))
Now, let's take a look at the results by importing the medians
dataset using
the R function readRDS()
.
## Import medians of credible intervals ---- medians <- readRDS("./test_siar/medians.rds")
# Import medians of credible intervals medians <- readRDS("docs/test_siar/medians.rds")
## Structure of the object ---- class(medians) ## Names of the dimensions ---- names(dimnames(medians)) ## Names of the content of the second dimension ---- dimnames(medians)[[2]] ## Names of the content of the third dimension ---- dimnames(medians)[[3]]
These data are structured in a three dimensions array.
## Extract results of the first replicate ---- medians[1, , ] ## Compute mean over replicates ---- apply(medians, 3:2, mean)
The structure of the widths
dataset is the same as for medians
.
If you are uncomfortable to deal with three dimensions arrays, you can use the
function samplesim::get_output()
. This function converts these two arrays
(i.e. medians
and widths
) into an unique data frame. For instance:
## Import medians and widths of credible intervals ---- datas <- samplesim::get_output("test_siar")
datas <- samplesim::get_output("test_siar", path = "docs")
## Structure of the data frame ---- str(datas) ## Print the first ten and last ten rows ---- rbind(head(datas, 10), tail(datas, 10))
If you want to select only the widths
data,
## Extract widths of credible intervals ---- widths <- datas[datas$type == "Width of credible intervals", ] ## Check ---- rbind(head(widths, 10), tail(widths, 10))
The argument change
of the function samplesim::get_output()
allows to
compute the percentage of change of medians and widths respectively, based on a
reference value of sample size (argument reference
; default is the minimum
sample size). For instance,
## Import medians and widths of credible intervals expressed as percentage of change ---- datas <- samplesim::get_output(name = "test_siar", change = TRUE, reference = 2)
datas <- samplesim::get_output(name = "test_siar", change = TRUE, reference = 2, path = "docs")
## Structure of the data frame ---- str(datas) ## Print the first ten and last ten rows ---- rbind(head(datas, 10), tail(datas, 10))
Here we notice that the column replicate is omitted and percentages of change are aggregated over replicates.
The samplesim
package also offers the possibility of visualizing the effect
of sample size on the width of the credible interval and on the median of the
posterior distribution of the mixing models. This is possible with the function
samplesim::plot_samplesim()
.
## Visualize results ---- samplesim::plot_samplesim(name = "test_siar")
# Visualize results samplesim::plot_samplesim(name = "test_siar", path = "docs")
If you prefer you can represent the percentages of change:
# Visualize results expressed as percentages of change samplesim::plot_samplesim(name = "test_siar", change = TRUE, reference = 2)
samplesim::plot_samplesim(name = "test_siar", change = TRUE, reference = 2, path = "docs")
Finally, if you are not satisfied with the quality of the graph, you can
customize it by 1) importing data and 2) programming your own graph. Let's take
an example with the
ggplot2
package (used in samplesim
).
## Import medians and widths of credible intervals ---- datas <- get_output("test_siar")
datas <- get_output("test_siar", path = "docs")
ggplot(aes(x = size, y = value), data = datas) + geom_boxplot(aes(fill = source), width = 0.8, outlier.shape = NA) + labs(x = "Sample size", y = "Values", fill = "Sources") + theme(legend.position = "bottom", legend.title = element_blank()) + facet_grid(. ~ type)
Or your can only represent the medians values. For instance (with percentages of change):
## Import medians and widths of credible intervals ---- datas <- samplesim::get_output(name = "test_siar", change = TRUE)
datas <- samplesim::get_output(name = "test_siar", change = TRUE, path = "docs")
## Select only medians of credible intervals ---- medians <- datas[datas$"type" == "Median of posterior distribution", ] ggplot(aes(x = size, y = value, group = source), data = medians) + geom_point(aes(color = source)) + geom_line(aes(color = source)) + labs(x = "Sample size", y = "Change in medians (%)", color = "Sources") + theme_light() + theme(legend.position = "bottom", legend.title = element_blank())
Lecomte N., Ehrich D., Casajus N., Berteaux D., Cameron C., and Yoccoz N.G. How many is enough? An R package for evaluating the effect of sample size on estimates and precision of stable isotope mixing solutions. Submitted to Methods in Ecology and Evolution.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.