User Manual"

knitr::opts_chunk$set(
  collapse = TRUE,
  fig.path = "../doc/figures/manual-",
  comment = "#>"
)
library(cvasi)

The cvasi package is a software library that extends the features of the programming language R by routines for ecotox effect modeling. The routines are intended to be used for scientific purposes as well as in the context of regulatory risk assessments. cvasi implements model equations and default parameters to simulate a set of toxicokinetic-toxicodynamic (TKTD) models. The models are intended to assess the effects of substance exposure on effect endpoints, such as growth and reproduction, for a range of standard test species.

The package offers a standardized framework which can be extended by (almost) any TKTD model that can be implemented in R. Its modeling workflows are designed to be self-explanatory and as intuitive as possible to make modeling approaches more accessible. In addition, the framework's routines are optimized to reduce the computing time required for extensive risk assessments.

Required skills

The cvasi package and its routines can be used by anyone with basic to intermediate knowledge of the R programming language. The routines can be used in a procedural fashion, commonly referred to as base R, in which function statements are executed, assigned to a variable, and then passed on to the next function:

library(cvasi)

## Sample base R workflow ##
# Create a scenario object of model GUTS-RED-IT
my_it <- GUTS_RED_IT()
# Set model parameters
my_it <- set_param(my_it, c(kd=1.2, hb=0, alpha=9.2, beta=4.3))
# Print scenario details
my_it

A more advanced approach is to use the tidyr syntax (cf. Wickham & Grolemund 2017) which uses verb-like statements to modify data. tidyr syntax can be used to chain functions calls which pass the result of one function on to the next:

## Example 'tidyr' workflow ##
# the pipeline (%>%) symbol passes results to the next statement
GUTS_RED_IT() %>%
  set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3))

Both workflow styles achieve the same result, but tidyr is terser, especially if several operations are performed on the same object. tidyr style is the recommended way to interact with the cvasi package.

How to install

The latest development version can be installed from GitHub:

install.packages("remotes", dependencies=TRUE)
remotes::install_github("cvasi-tktd/cvasi", dependencies=TRUE)

The package and its source code is also available on GitHub.

Modeling and assessment workflow

Overview

The central concept within the cvasi package is the scenario. It encapsulates all key properties to run a simulation such as initial state and model parameters. The scenario can be thought of as the collection of data and parameters necessary to reproduce a certain study condition or situation. The set of available scenario properties depends on the model type but may include:

Scenario properties can be modified using intuitively named tidyr verbs: in general, each verb accepts a scenario as an argument and returns a modified scenario as the result. For instance, the set_param() function will update a scenario's parameters to new values:

library(cvasi)

# Create a new GUTS-RED-IT scenario and set its model parameters
GUTS_RED_IT() %>%
  set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3))

If a scenario is properly defined, it can be passed to dedicated functions that, for instance, run a simulation or derive assessment endpoints. These higher-level functions do not return a scenario object but a table of e.g. simulation results for that particular scenario. As an example, simulating the sample scenario minnow_it will run a GUTS-RED-IT model using substance-specific parameters:

# Example GUTS-RED-IT scenario derived from an acute fish toxicity study
# of the fathead minnow and Chlorpyrifos (Geiger et al. 1988)
minnow_it %>%
  simulate()

The table returned by simulate() contains the value of each state variable at the requested output time points. In case of the GUTS-RED-IT model, this comprises the state variables D (scaled damage) and H (cumulative hazard). Effect levels, effect profiles, and dose-response graphs can be derived in a similar manner. Please refer to EFSA PPR Panel (2018) or the package help for more information on GUTS-RED type models:

# Access the package help on GUTS-RED type models
?"GUTS-RED-models"

Setting up a scenario

A scenario is created by first calling a constructor function that is dedicated to a particular model. The cvasi package supports multiple TKTD models, for instance:

The listed functions create objects that will carry all data and settings to fully describe a scenario. In case of a GUTS-RED-IT scenario, one needs to provide at least the four model parameters and an exposure time-series to fully define a scenario:

# Define an exposure time-series
myexposure <- data.frame(time=c(0, 1, 1.01, 5),
                         conc=c(10, 10, 0, 0))

# Create and parameterize a scenario
GUTS_RED_IT() %>%
  set_param(c(kd=1.2, hb=0, alpha=9.2, beta=4.3)) %>%
  set_exposure(myexposure) -> myscenario

One can examine the main scenario properties by passing the scenario to the console like in the next example. The console will print the model name, model parameters (param), initial state (init), enabled effect endpoints (endpt), output times (times), environmental forcings (forcs), and exposure series (expsr). Depending on model type, the scenario may carry additional properties that are not necessarily displayed in the console output.

# Print information about the scenario
myscenario

Several set_*() verbs are available to modify scenario properties, such as:

Please note that set_exposure() will, by default, use the exposure series' time points also as output time points. If this behavior is undesired, it can be disabled by setting the argument reset_times=FALSE:

# Update the exposure time-series but keep former output time points
myscenario %>%
  set_exposure(no_exposure(), reset_times=FALSE)

Note that times still has the original four elements although the assigned exposure series has only a single entry. Output time points can be explicitly set with set_times(). For deriving effect endpoints, the simulated time frame can be split up into moving exposure windows of a certain length using set_window(). A more complex scenario setup may look like the following workflow:

# Selected model parameters
myparam <- c(p_M=3211, v=0.023, k_J=0.63)
# Initial body length L
myinit <- c(L=0.02)
# Constant non-zero exposure
myexposure <- data.frame(time=0, conc=1.72)

DEB_abj() %>%
  set_param(myparam) %>%
  set_init(myinit) %>%
  set_exposure(myexposure) %>%
  set_times(0:10) %>%             # Output times 0,1,2,...,10
  set_mode_of_action(4) %>%       # Method of Action #4 to be activated
  set_window(length=3)            # Using moving exposure windows of length 3 days

Running a simulation

After a scenario is fully set up, it can be passed to simulate() to return model results:

# Example scenario of the Lemna TKTD model
metsulfuron %>%
  set_times(0:7) %>%
  simulate()

simulate() returns the model state for each output time point. In this example, it returns results for the period from zero to seven with an equi-distant time step length of one. The temporal resolution of results can be increased by specifying additional output times:

# Same simulation period, but with a smaller step length of 0.1
metsulfuron %>%
  set_times(seq(0, 7, 0.1)) %>%
  simulate() %>%
  tail()

The resulting table now contains ten times as many rows because we decreased the step length by a factor of ten but simulated the same period. The temporal resolution of simulations can have an influence on results, i.e. the numerical values of the returned variables. These differences usually originate from small numerical errors introduced by the solver of the model’s Ordinary Differential Equations. Generally, parameters such as the step-length in time can have influence on the precision of simulation results.

There are several ways to increase the precision of simulation results without modifying a scenario's output times. One option is to decrease the solver’s maximum step length in time by setting the optional argument hmax. The smaller hmax, the more precise the results:

# Original simulation period, but with a maximum solver step length of hmax=0.01
metsulfuron %>%
  set_times(0:7) %>%
  simulate(hmax=0.01)

Although very effective, reducing hmax to small values may lead to inefficiently long runtimes of simulations. If numerical precision is an issue, it is advisable to also test other solver settings such as absolute and relative error tolerances (atol and rtol) or to switch to another solver method. Numerical precision and stability is a complex and advanced topic, please refer to the manual of simulate() for additional information:

?cvasi::simulate

Deriving assessment endpoints

Simulation results are the basis for deriving assessment endpoints such as effect levels and effect profiles. The cvasi package implements routines to derive these endpoints efficiently independent of the actual model that is being assessed. Hence, if a model is available within the package framework, it can be used to derive assessment endpoints. As an example, the following statement derives effect levels for a sample GUTS-RED-IT scenario:

# GUTS-RED-IT scenario of the fathead minnow and chlorpyrifos
minnow_it %>% effect()
# make sure that value in text are still up to date
testthat::expect_equal(minnow_it %>% effect() %>% dplyr::pull(L), 6.297e-5, tolerance=0.001)

The lethality endpoint L has a value of 6.3e-5 which means that lethality has only marginally increased by 0.0063% compared to a control scenario without exposure to the contaminant. The columns L.dat.start and L.dat.end denote the period for which the effect level was observed. In case of GUTS-RED models, this usually refers to the whole simulation period. For certain assessments it may be necessary to evaluate all exposure periods of a certain length within the original series separately. This approach is referred to as a moving exposure window:

# Setting up a custom scenario with a total simulation period of 14 days and
# an exposure window length of 7 to assess a trapezoidal exposure pattern
americamysis %>%
  set_window(7) %>%
  set_exposure(data.frame(t=c(0,3,4,7,8), c=c(0,0,3,3,0))) %>%
  set_times(0:14) -> mydeb

# Derive maximum effect level of all exposure windows
mydeb %>% effect()
# make sure that value in text are still up to date
testthat::expect_equal(mydeb %>% effect() %>% dplyr::pull(L), 0.0521, tolerance=0.001)

The americamysis scenario provides two effect endpoints: structural length (L) and reproduction (R). Exposure to the simulated toxicant results at maximum in a 5.21% decrease in body length compared to control scenarios. The maximum effect was observed in the exposure window starting at time 2.0 and ending at 9.0, cf. columns L.dat.start and L.dat.end. The toxicant had no effect on reproduction due to the simulation being too short for reproduction to occur.

Scenarios can also be assessed regarding to so-called effect profiles, or EPx values for short. An EPx value quantifies the multiplication factor that is necessary to produce an effect level of x% for that particular scenario. As an example, an EP50 of 17.0 would mean that the scenario's exposure series needs to be multiplied by a factor of 17 to observe a 50% effect. The function epx() can derive effect profiles efficiently with arbitrary precision:

# Restrict assessed endpoints to structural length (L)
mydeb %>% 
  set_endpoints("L") %>%
  epx()
# make sure that value in text are still up to date
testthat::expect_equal(mydeb %>%  set_endpoints("L") %>% epx(ep_only=TRUE) %>% unlist(use.names=FALSE), c(1.162598, 1.711914), ignore_attr=TRUE, tolerance=0.01)

In this example, a factor of 1.16 resulted in 10% effect on structural length and a factor of 1.71 in 50% effect. epx() provides means to control the precision of derived EPx values, the range of tested multiplication factors, factor cutoff thresholds, and debugging capabilities. To examine how a particular EPx value was derived, simply set the argument verbose=TRUE:

# Examine how the EP20 value is derived
minnow_it %>% epx(level=20, verbose=TRUE)

In this example, the algorithm of epx() first tests a multiplication factor of 10 and then continues testing additional factors using a binary search algorithm until a factor is found that causes 20% effect.

Please note: Depending on model and chosen parameters, it may be impossible to calculate EPx values (for selected effect levels), because the model or organism cannot reach the required state. As an example: If the simulated period of a DEB_abj scenario is too short, reproduction cannot occur. Therefore, effects on reproduction will always be zero independent of the exposure level. Consequently, EPx values for the reproduction endpoint will not be available.

Data format of model inputs

Input data for the cvasi package relies on common R data structures and imposes only minimal requirements to set up scenarios:

  1. Data representing numerical values should be represented by numerical types. Practically, this means that numerical scenario properties such as output time points must not be passed as string literals even if the string may be representing a number. Due to R being a dynamically typed language, parameter values and their types may have to be checked on import by the user.
  2. Time-series data must be represented by a data.frame with exactly two numerical columns: The first column for time, the second for the series' value. The column names are irrelevant but sensible names may help documenting the data. Constant values can be represented by time-series with only a single row or entry.

Therefore, it should be possible to apply the package's routines on existing datasets with minimal effort. Please refer to Wickham and Grolemund (2017) on how to implement plain and consistent processes to import and transform data using tidyr syntax.

cvasi does support loading data from common sources, such as the exposure model TOXSWA, to ease data processing. Notable file formats and data structures that are supported are exposure time-series from the model FOCUS TOXSWA, as well as fitted GUTS-RED model parameter sets from morse.

How to derive model inputs

In general, all models require parameter values specific to the simulated species and substance. Some scenario types contain default values for selected model parameters as recommended by model authors. Scenarios providing default values include:

In addition, some models require values for environmental factors such as temperature and nutrient concentrations. These values or time-series are specific to a certain scenario and must be chosen accordingly.

Parameters of TKTD sub-models are substance-specific and are usually obtained by fitting model parameters to observed data. The cvasi package supports this process with the function calibrate() which uses a frequentist approach for parameter fitting. However, parameters can also be derived by other means. Generally, the user is responsible for choosing model parameters and the package does not impose any restrictions in this regard.

The package function calibrate() supports the following operation modes:

Parameter fitting builds upon the R function stats::optim() and it exposes full control over the fitting process. In this context, a dataset is represented by a calibration set object which consists of a scenario, observed data, and an optional weighting term. Please refer to the Modeling Howto for a description of the fitting process using calibrate().

As an alternative for GUTS-RED type models, model parameters can also be obtained using the morse package. It uses a Bayesian approach for parameter fitting.

Description of model outputs

The cvasi package does not use or define any custom file formats for model outputs but strictly relies on common R data structures. Simulation results and assessment endpoints are returned in tabular formats which can easily be serialized to common file formats such as .csv or Microsoft Excel files for further processing and data storage.

The following subsections will describe in detail the return values of common package routines for simulating and assessing scenarios.

Simulation results

The return value of simulate() is a data.frame containing the value of each state variable at each requested output time. The first column represents time, followed by columns for each state variable:

## Display selected scenario properties
metsulfuron@times  # Output times
metsulfuron@init   # Initial state

## Simulate the sample scenario
metsulfuron %>% simulate()

The model state in the first row is, by definition, equal to the initial state of the scenario. The simulation result may contain additional columns that are derived variables or intermediate model variables. In case of the metsulfuron scenario, the internal toxicant concentration (C_int) was derived from BM, M_int, and model parameters. The number of fronds (FrondNo) was derived from BM and model parameters. Additional outputs such as C_int and FrondNo are included in results for user convenience and to support model understanding. Additional outputs can be disabled by setting the optional argument nout=0:

metsulfuron %>%
  simulate(nout=0) %>%
  head(5)

Intermediate model variables can be enabled in the same manner:

metsulfuron %>%
  simulate(nout=8) %>%
  head(5)

Effect levels

The return value of effect() is a data.frame containing the effect level of each selected endpoint. The first column, scenario, contains the original scenario which was the basis of the assessment. Followed by the effect level and the exposure window in which the maximum effect was observed:

metsulfuron %>%
  set_times(0:7) %>%  # restrict scenario to the period [0,7]
  effect()

Effect levels are reported as a fraction relative to a control scenario. Effects can be converted to percentages by multiplying the value with 100 (%). For scenarios with moving exposure windows disabled, the reported window will cover the entire simulation period. If multiple windows result in identical maximum effect levels, the first matching window will be reported. The table's content can be reduced to endpoints only by setting the argument ep_only=TRUE.

effect() can also be instructed to return the effect levels of all assessed exposure windows by setting the argument max_only=FALSE:

metsulfuron %>%
  set_window(length=7, interval=1) %>%   # enable moving exposure windows
  effect(max_only=FALSE)                 # return effects of all windows

In few cases, effect() may report spurious non-zero effect levels that are very close to zero. These can either be avoided by increasing the numerical precision of simulations or by setting a marginal effect threshold which instructs effect() to zero any effect below that threshold. A warning will be shown if a marginal effect threshold greater than 1% is chosen. Applying a threshold of 1% to the previous example results in zeroing the effect in the last and second to last exposure window but leaving the remaining results unchanged:

metsulfuron %>%
  set_window(length=7, interval=1) %>%         # enable moving exposure windows
  effect(max_only=FALSE, marginal_effect=0.01) # return effects of all windows

Effect profiles

The return value of epx() is a data.frame containing the effect profile (EPx) of each selected endpoint and effect level. The first column, scenario, contains the original scenario which was the basis of the assessment. Followed by the derived effect profiles, i.e. the multiplication factors to achieve x% effect:

metsulfuron %>% epx()

For the sample metsulfuron scenario, a factor of 0.395 would need to be applied to the exposure series to achieve an effect level of 10% on biomass (BM). An effect level of 50% on biomass is unattainable with the tested factors ranging from 1e-30 to 1e30 which is demonstrated by the warning message and the NA value for BM.EP50.

The table's content can be reduced to endpoints only by setting the argument ep_only=TRUE. The range of tested factors can be controlled by setting the arguments min_factor and max_factor. Multiplication factors exceeding this range will raise a warning and NA will be returned for that EPx. Finding multiplication factors can be cut short if a certain margin of safety is reached by using the argument factor_cutoff: if the algorithm identifies that an EPx greater or equal to factor_cutoff is required, then factor_cutoff is reported as the result. This approach may be used to save computational resources for long running models if the exact EPx value above the cutoff is not of interest.

The precision of EPx values is controlled by the argument effect_tolerance and is given as the upper absolute error threshold of effects that is deemed acceptable. The default value of 0.001 ensures that a derived EPx will result in an effect of x% ± 0.1. Decreasing the effect_tolerance will result in additional model iterations and longer runtime.

The algorithm's inner workings can be examined by setting the argument verbose=TRUE:

metsulfuron %>%
  set_endpoints("BM") %>%
  epx(verbose=TRUE)

In this example, the binary search finds a BM.EP10 with acceptable precision in less than ten iterations. For the BM.EP50, the algorithm re-uses the knowledge gained from previously performed calculations but is ultimately unable to find a suitable multiplication factor within the allowed range of multiplication factors.

References



Try the cvasi package in your browser

Any scripts or data that you put into this service are public.

cvasi documentation built on Sept. 23, 2024, 9:08 a.m.