knitr::opts_chunk$set( collapse = TRUE, fig.path = "../doc/figures/manual-", comment = "#>" )
library(cvasi)
The cvasi package is a software library that extends the features of the programming language R by routines for ecotox effect modeling. The routines are intended to be used for scientific purposes as well as in the context of regulatory risk assessments. cvasi implements model equations and default parameters to simulate a set of toxicokinetic-toxicodynamic (TKTD) models. The models are intended to assess the effects of substance exposure on effect endpoints, such as growth and reproduction, for a range of standard test species.
The package offers a standardized framework which can be extended by (almost) any TKTD model that can be implemented in R. Its modeling workflows are designed to be self-explanatory and as intuitive as possible to make modeling approaches more accessible. In addition, the framework's routines are optimized to reduce the computing time required for extensive risk assessments.
The cvasi package and its routines can be used by anyone with basic to intermediate knowledge of the R programming language. The routines can be used in a procedural fashion, commonly referred to as base R, in which function statements are executed, assigned to a variable, and then passed on to the next function:
library(cvasi) ## Sample base R workflow ## # Create a scenario object of model GUTS-RED-IT my_it <- GUTS_RED_IT() # Set model parameters my_it <- set_param(my_it, c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) # Print scenario details my_it
A more advanced approach is to use the tidyr syntax (cf. Wickham & Grolemund 2017) which uses verb-like statements to modify data. tidyr syntax can be used to chain functions calls which pass the result of one function on to the next:
## Example 'tidyr' workflow ## # the pipeline (%>%) symbol passes results to the next statement GUTS_RED_IT() %>% set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3))
Both workflow styles achieve the same result, but tidyr is terser, especially if several operations are performed on the same object. tidyr style is the recommended way to interact with the cvasi package.
The latest development version can be installed from GitHub:
install.packages("remotes", dependencies=TRUE) remotes::install_github("cvasi-tktd/cvasi", dependencies=TRUE)
The package and its source code is also available on GitHub.
The central concept within the cvasi package is the scenario. It encapsulates all key properties to run a simulation such as initial state and model parameters. The scenario can be thought of as the collection of data and parameters necessary to reproduce a certain study condition or situation. The set of available scenario properties depends on the model type but may include:
Scenario properties can be modified using intuitively named tidyr verbs: in
general, each verb accepts a scenario as an argument and returns a modified
scenario as the result. For instance, the set_param()
function will update
a scenario's parameters to new values:
library(cvasi) # Create a new GUTS-RED-IT scenario and set its model parameters GUTS_RED_IT() %>% set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3))
If a scenario is properly defined, it can be passed to dedicated functions that,
for instance, run a simulation or derive assessment endpoints. These higher-level functions
do not return a scenario object but a table of e.g. simulation results for that
particular scenario. As an example, simulating
the sample scenario minnow_it
will run a GUTS-RED-IT model using
substance-specific parameters:
# Example GUTS-RED-IT scenario derived from an acute fish toxicity study # of the fathead minnow and Chlorpyrifos (Geiger et al. 1988) minnow_it %>% simulate()
The table returned by simulate()
contains the value of each state variable at
the requested output time points. In case of the GUTS-RED-IT model, this
comprises the state variables D
(scaled damage) and H
(cumulative hazard).
Effect levels, effect profiles, and dose-response graphs
can be derived in a similar manner. Please refer to EFSA PPR Panel (2018) or the
package help for more information on GUTS-RED type models:
# Access the package help on GUTS-RED type models ?"GUTS-RED-models"
A scenario is created by first calling a constructor function that is dedicated to a particular model. The cvasi package supports multiple TKTD models, for instance:
GUTS_RED_SD()
GUTS_RED_IT()
Lemna_Schmitt()
, published by Schmitt et al. (2013)Lemna_SETAC()
, published by Klein et al. (2021)Myrio()
, an adaption of Klein et al. (2021) to Myriophyllum at Tier 2CDEB_abj()
, the abj model with type M accelerationThe listed functions create objects that will carry all data and settings to fully describe a scenario. In case of a GUTS-RED-IT scenario, one needs to provide at least the four model parameters and an exposure time-series to fully define a scenario:
# Define an exposure time-series myexposure <- data.frame(time=c(0, 1, 1.01, 5), conc=c(10, 10, 0, 0)) # Create and parameterize a scenario GUTS_RED_IT() %>% set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) %>% set_exposure(myexposure) -> myscenario
One can examine the main scenario properties by passing the scenario to the console like in the next example. The console will print the model name, model parameters (param), initial state (init), enabled effect endpoints (endpt), output times (times), environmental forcings (forcs), and exposure series (expsr). Depending on model type, the scenario may carry additional properties that are not necessarily displayed in the console output.
# Print information about the scenario
myscenario
Several set_*()
verbs are available to modify scenario properties, such as:
set_tag()
, to assign a custom tag to identify the scenarioset_init()
, to set the initial stateset_param()
, to modify model parametersset_times()
, to define output time pointsset_forcings()
, to set environmental factors (i.e. forcings)set_exposure()
, to set an exposure time-seriesset_endpoints()
, to enable/disable assessment endpointsset_transfer()
, to define transfer intervals of Lemna frondsset_moa()
, to select a model's mode of actionset_window()
, to control moving exposure windowsPlease note that set_exposure()
will, by default, use the exposure series' time points
also as output time points. If this behavior is undesired, it can be disabled
by setting the argument reset_times=FALSE
:
# Update the exposure time-series but keep former output time points myscenario %>% set_exposure(no_exposure(), reset_times=FALSE)
Note that times
still has the original four elements although the assigned exposure
series has only a single entry. Output time points can be
explicitly set with set_times()
. For deriving effect endpoints, the simulated
time frame can be split up into moving exposure windows of a certain length using
set_window()
. A more complex scenario setup may look like the following workflow:
# Selected model parameters myparam <- c(p_M=3211, v=0.023, k_J=0.63) # Initial body length L myinit <- c(L=0.02) # Constant non-zero exposure myexposure <- data.frame(time=0, conc=1.72) DEB_abj() %>% set_param(myparam) %>% set_init(myinit) %>% set_exposure(myexposure) %>% set_times(0:10) %>% # Output times 0,1,2,...,10 set_mode_of_action(4) %>% # Method of Action #4 to be activated set_window(length=3) # Using moving exposure windows of length 3 days
After a scenario is fully set up, it can be passed to simulate()
to return
model results:
# Example scenario of the Lemna TKTD model metsulfuron %>% set_times(0:7) %>% simulate()
simulate()
returns the model state for each output time point. In this example,
it returns results for the period from zero to seven with an equi-distant time
step length of one.
The temporal resolution of results can be increased by specifying additional
output times:
# Same simulation period, but with a smaller step length of 0.1 metsulfuron %>% set_times(seq(0, 7, 0.1)) %>% simulate() %>% tail()
The resulting table now contains ten times as many rows because we decreased the step length by a factor of ten but simulated the same period. The temporal resolution of simulations can have an influence on results, i.e. the numerical values of the returned variables. These differences usually originate from small numerical errors introduced by the solver of the model’s Ordinary Differential Equations. Generally, parameters such as the step-length in time can have influence on the precision of simulation results.
There are several ways to increase the precision of simulation results without
modifying a scenario's output times. One option is to decrease
the solver’s maximum step length in time by setting the optional argument hmax
.
The smaller hmax
, the more precise the results:
# Original simulation period, but with a maximum solver step length of hmax=0.01 metsulfuron %>% set_times(0:7) %>% simulate(hmax=0.01)
Although very effective, reducing hmax
to small values may lead to inefficiently
long runtimes of simulations. If numerical precision is an issue,
it is advisable to also test other solver settings such as absolute and relative
error tolerances (atol
and rtol
) or to switch to another solver method.
Numerical precision and stability is a complex and advanced topic, please
refer to the manual of simulate()
for additional information:
?cvasi::simulate
Simulation results are the basis for deriving assessment endpoints such as effect levels and effect profiles. The cvasi package implements routines to derive these endpoints efficiently independent of the actual model that is being assessed. Hence, if a model is available within the package framework, it can be used to derive assessment endpoints. As an example, the following statement derives effect levels for a sample GUTS-RED-IT scenario:
# GUTS-RED-IT scenario of the fathead minnow and chlorpyrifos minnow_it %>% effect()
# make sure that value in text are still up to date testthat::expect_equal(minnow_it %>% effect() %>% dplyr::pull(L), 6.297e-5, tolerance=0.001)
The lethality endpoint L
has a value of 6.3e-5
which means that lethality
has only marginally increased by 0.0063% compared to a control scenario without
exposure to the contaminant.
The columns L.dat.start
and L.dat.end
denote the period for which the
effect level was observed. In case of GUTS-RED models, this usually refers to the
whole simulation period. For certain assessments it may be necessary to evaluate
all exposure periods of a certain length within the original series separately.
This approach is referred to as a moving exposure window:
# Setting up a custom scenario with a total simulation period of 14 days and # an exposure window length of 7 to assess a trapezoidal exposure pattern americamysis %>% set_window(7) %>% set_exposure(data.frame(t=c(0,3,4,7,8), c=c(0,0,3,3,0))) %>% set_times(0:14) -> mydeb # Derive maximum effect level of all exposure windows mydeb %>% effect()
# make sure that value in text are still up to date testthat::expect_equal(mydeb %>% effect() %>% dplyr::pull(L), 0.0521, tolerance=0.001)
The americamysis
scenario provides two effect endpoints: structural length (L
)
and reproduction (R
). Exposure to the simulated toxicant results at maximum in a 5.21%
decrease in body length compared to control scenarios. The
maximum effect was observed in the exposure window starting at time 2.0
and
ending at 9.0
, cf. columns L.dat.start
and L.dat.end
. The toxicant had no effect
on reproduction due to the simulation being too short for reproduction to occur.
Scenarios can also be assessed regarding to so-called effect profiles, or EPx values for
short. An EPx value quantifies the multiplication factor that is necessary to
produce an effect level of x% for that particular scenario. As an example,
an EP50 of 17.0
would mean that the scenario's exposure series needs to be
multiplied by a factor of 17 to observe a 50% effect. The function
epx()
can derive effect profiles efficiently with arbitrary precision:
# Restrict assessed endpoints to structural length (L) mydeb %>% set_endpoints("L") %>% epx()
# make sure that value in text are still up to date testthat::expect_equal(mydeb %>% set_endpoints("L") %>% epx(ep_only=TRUE) %>% unlist(use.names=FALSE), c(1.162598, 1.711914), ignore_attr=TRUE, tolerance=0.01)
In this example, a factor of 1.16
resulted in 10% effect on structural
length and a factor of 1.71
in 50% effect. epx()
provides means to control the precision
of derived EPx values, the range of tested multiplication factors, factor
cutoff thresholds, and debugging capabilities. To examine how a particular
EPx value was derived, simply set the argument verbose=TRUE
:
# Examine how the EP20 value is derived minnow_it %>% epx(level=20, verbose=TRUE)
In this example, the algorithm of epx()
first tests a multiplication factor of
10
and then continues testing additional factors using a binary search algorithm
until a factor is found that causes 20% effect.
Please note: Depending on model and chosen parameters, it may be impossible to calculate EPx values (for selected effect levels), because the model or organism cannot reach the required state. As an example: If the simulated period of a DEB_abj scenario is too short, reproduction cannot occur. Therefore, effects on reproduction will always be zero independent of the exposure level. Consequently, EPx values for the reproduction endpoint will not be available.
Input data for the cvasi package relies on common R data structures and imposes only minimal requirements to set up scenarios:
data.frame
with exactly
two numerical columns: The first column for time, the second
for the series' value. The column names are irrelevant but sensible names
may help documenting the data. Constant values can be represented by
time-series with only a single row or entry.Therefore, it should be possible to apply the package's routines on existing datasets with minimal effort. Please refer to Wickham and Grolemund (2017) on how to implement plain and consistent processes to import and transform data using tidyr syntax.
cvasi does support loading data from common sources, such as the exposure model TOXSWA, to ease data processing. Notable file formats and data structures that are supported are exposure time-series from the model FOCUS TOXSWA, as well as fitted GUTS-RED model parameter sets from morse.
In general, all models require parameter values specific to the simulated species and substance. Some scenario types contain default values for selected model parameters as recommended by model authors. Scenarios providing default values include:
Lemna_Schmitt()
and derived models are parameterized according to Schmitt
et al. (2013)Lemna_SETAC()
, Myrio()
, and derived models are parameterized as recommended
by Klein et al. (2021)In addition, some models require values for environmental factors such as temperature and nutrient concentrations. These values or time-series are specific to a certain scenario and must be chosen accordingly.
Parameters of TKTD sub-models are substance-specific and are usually obtained
by fitting model parameters to observed data. The cvasi package supports this process with
the function calibrate()
which uses a frequentist approach for parameter
fitting. However, parameters can also be derived by other means. Generally,
the user is responsible for choosing model parameters and the package does
not impose any restrictions in this regard.
The package function calibrate()
supports the following operation modes:
Parameter fitting builds upon the R function stats::optim()
and it exposes full
control over the fitting process. In this context, a dataset is represented by a
calibration set object which consists of a scenario, observed data, and an
optional weighting term.
Please refer to the Modeling Howto for a description of the
fitting process using calibrate()
.
As an alternative for GUTS-RED type models, model parameters can also be obtained using the morse package. It uses a Bayesian approach for parameter fitting.
The cvasi package does not use or define any custom file formats for model outputs
but strictly relies on common R data structures. Simulation results and
assessment endpoints are returned in tabular formats which can easily be
serialized to common file formats such as .csv
or Microsoft Excel files
for further processing and data storage.
The following subsections will describe in detail the return values of common package routines for simulating and assessing scenarios.
The return value of simulate()
is a data.frame
containing the value of each
state variable at each requested output time. The first column represents time,
followed by columns for each state variable:
## Display selected scenario properties metsulfuron@times # Output times metsulfuron@init # Initial state ## Simulate the sample scenario metsulfuron %>% simulate()
The model state in the first row is, by definition, equal to the initial state
of the scenario. The simulation result may contain additional columns that
are derived variables or intermediate model variables. In case of the
metsulfuron
scenario, the internal toxicant concentration (C_int
) was derived
from BM
, M_int
, and model parameters. The number of fronds (FrondNo
)
was derived from BM
and model parameters. Additional outputs such as C_int
and FrondNo
are included in results for user convenience and to support model
understanding. Additional outputs can be disabled by setting the optional
argument nout=0
:
metsulfuron %>% simulate(nout=0) %>% head(5)
Intermediate model variables can be enabled in the same manner:
metsulfuron %>% simulate(nout=8) %>% head(5)
The return value of effect()
is a data.frame
containing the effect level of
each selected endpoint. The first column, scenario
, contains the original
scenario which was the basis of the assessment. Followed by the effect level and
the exposure window in which the maximum effect was observed:
metsulfuron %>% set_times(0:7) %>% # restrict scenario to the period [0,7] effect()
Effect levels are reported as a fraction relative to a control scenario. Effects
can be converted to percentages by multiplying the value with 100
(%). For
scenarios with moving exposure windows disabled, the reported window will cover
the entire simulation period. If multiple windows result in identical maximum
effect levels, the first matching window will be reported.
The table's content can be reduced to endpoints only by setting the
argument ep_only=TRUE
.
effect()
can also be instructed to return the effect levels of all assessed
exposure windows by setting the argument max_only=FALSE
:
metsulfuron %>% set_window(length=7, interval=1) %>% # enable moving exposure windows effect(max_only=FALSE) # return effects of all windows
In few cases, effect()
may report spurious non-zero effect levels that are
very close to zero. These can either be avoided by increasing the numerical
precision of simulations or by setting a marginal effect threshold which instructs
effect()
to zero any effect below that threshold. A warning will be shown
if a marginal effect threshold greater than 1% is chosen. Applying a threshold
of 1% to the previous example results in zeroing the effect in the last and
second to last exposure window but leaving the remaining results unchanged:
metsulfuron %>% set_window(length=7, interval=1) %>% # enable moving exposure windows effect(max_only=FALSE, marginal_effect=0.01) # return effects of all windows
The return value of epx()
is a data.frame
containing the effect profile (EPx) of
each selected endpoint and effect level. The first column, scenario
, contains
the original scenario which was the basis of the assessment. Followed by the
derived effect profiles, i.e. the multiplication factors to achieve x% effect:
metsulfuron %>% epx()
For the sample metsulfuron
scenario, a factor of 0.395
would need to be applied
to the exposure series to achieve an effect level of 10% on biomass (BM). An
effect level of 50% on biomass is unattainable with the tested factors ranging
from 1e-30
to 1e30
which is demonstrated by the warning message and the
NA
value for BM.EP50
.
The table's content can be reduced to endpoints only by setting the
argument ep_only=TRUE
. The range of tested factors can be controlled by
setting the arguments min_factor
and max_factor
. Multiplication factors
exceeding this range will raise a warning and NA
will be returned for
that EPx. Finding multiplication factors can be cut short if a
certain margin of safety is reached by using the argument factor_cutoff
:
if the algorithm identifies that an EPx greater or equal to factor_cutoff
is required,
then factor_cutoff
is reported as the result. This approach may
be used to save computational resources for long running models if the
exact EPx value above the cutoff is not of interest.
The precision of EPx values is controlled by the argument effect_tolerance
and is given as the upper absolute error threshold of effects that
is deemed acceptable. The default value of 0.001
ensures that a derived
EPx will result in an effect of x% ± 0.1. Decreasing the effect_tolerance
will result in additional model iterations and longer runtime.
The algorithm's inner workings can be examined by setting the argument verbose=TRUE
:
metsulfuron %>% set_endpoints("BM") %>% epx(verbose=TRUE)
In this example, the binary search finds a BM.EP10
with acceptable precision in
less than ten iterations. For the BM.EP50
, the algorithm re-uses the
knowledge gained from previously performed calculations but is ultimately unable
to find a suitable multiplication factor within the allowed range of multiplication
factors.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.