In daniel-gerhard/standrc: Dose-response curves with stan

The package standrc provides wrapper functions to fit dose-response models using STAN by a convenient formula interface. Of course, this results in a reduction of flexibility in setting up the model; hence, encouraging everyone to directly use rstan to fit a specific model of interest.

standrm

The function standrm() can be used similar as function drm() in package drc or function medrm() in package medrc.

The input is structured as

formula: A formula specifying the dose response relationship with the name of the response on the left and a covariate with dose levels on the right hand side (response ~ dose)
data: a data.frame object with containing all variables defined in the model
fct: a model function of package drc. The parameterization of the drc model functions is changed, modelling the slope and the asymmetry parameter on a logarithmic scale and allowing the lower and upper asymptotes to swap parameters.

Ryegrass Example

The ryegrass dataset of package drc provides a simple example, fitting a four parameter log-logistic model. The asymmetry parametr is fixed at 0, resulting in a symmetric curve, as it is defined on the log-scale, in contrast to the drc parameterization, where it is modelled without any tranformation.

rgm <- standrm(rootl ~ conc, data=ryegrass,
               fct=LL.5(fixed=c(NA, NA, NA, NA, 0)),
               iter=5000)

library(standrc)
rgm <- standrm(rootl ~ conc, data=ryegrass,
               fct=LL.5(fixed=c(NA, NA, NA, NA, 0)),
               iter=5000)

The resulting object is a list containing the data, stan code, and the rstan results with the samples from the posterior distribution, which can be summarized using the print() method.

print(rgm)

Predictions for the dose-response curve can be obtained by the predict method, returning a matrix with the samples from the posterior distribution. These samples can be summarized e.g. by arithmetic means, standard deviations or quantiles.

newd <- expand.grid(conc=exp(seq(log(0.5), log(30), length=100)))
pm <- predict(rgm, newdata=newd)
newd$p <- apply(pm, 2, function(x) mean(x, na.rm=TRUE))
newd$pmin <- apply(pm, 2, function(x) quantile(x, 0.025, na.rm=TRUE))
newd$pmax <- apply(pm, 2, function(x) quantile(x, 0.975, na.rm=TRUE))

The predicted curves can be displayed by using the plot method.

plot(rgm, ndose=50, logx=TRUE)

Using 50 support points on the dose scale results in a nicely smooth curve, but increasing this number increases computation time for the predictions.

standrc Methods

The print method returns a summary of the posterior samples of the fixed effects parameters. When random effects are assumed, the fixed effect samples are obtained conditional on a new, unknown random effect cluster. The fixef method returns a summary of the fixed effects parameters conditional on a random effect of 0. If no random effects are present, the print and fixef methods return similar results. The ranef method returns a summary of posterior samples of the random effects and the VarCorr method provides an overview of the samples of the corresponding variance components.

A fitted and a residual method is available to extract a matrix with samples from the posterior of the residuals and fitted values (different parameters in columns, samples in rows). Predictions (posterior samples) for a new dataset can be obtained using the predict method; when random effects are defined in the model, the predictons are generated for an unknown random effect cluster.

Deriving Effective Dose Parameters

Similar to package drc a function ED() is available to calculate the (relative) effective dose at specified response levels (between 0 and 100) given a standrc object. The output contains the median of the samples of the ED posterior distribution as well as 95\% credible intervals separately for each given response level and curveid. When random effects are included in the model (see the following Section) the ED is computed conditional on an unobserved random effect level.

ED(rgm, respLev=c(5, 50, 95))

Model Extensions

The model can be extended by

curveid: estimating nonlinear model parameters based on several clusters, fitting a curve for each group of observations. Using a formula interface, on the left-hand side the parameters can be specified, which are assumed to be different between curves, otherwise a single parameter is chosen. On the right-hand side the name of a factor containing the cluster levels needs to be provided. E.g. b + d + e ~ cluster, where the letters b to f are the model parameters as defined in the LL.5() function.
random: similar to the curveid argument, but modelling random effects additive on each of the specified parameters, summarizing the cluster effects by variance components (see the model definition for more information).

The ... argument can be used to supply further arguments to the function stan(), like the number of iterations iter, burn-in samples warmup, or thinning thin.

Spinach Example

Streibig et al. (1999) investigated the inhibition of photosynthesis in response to two synthetic photosystem II inhibitors, the herbicides diuron and bentazon. In an experiment, the effect of oxygen consumption of thylakoid membranes (chloroplasts) from spinach was measured after incubation with the synthetic inhibitors. Five assays, three treated with bentazon and two with diuron, were used. For each assay six increasing herbicide concentrations were applied together with a negative control, using different dose ranges on a logarithmic scale for the two treatments to encompass the whole dose-response range based on preliminary experiments.

A three-parameter log-logistic model is assumed, fixing the asymmetry parameter to obtain a symmetric curve and forcing the lower asymptote to 0 (the asymmetry parameter is modelled on a logarithmic scale). In contrast to the drc package, the slope parameter is also defined on a log-scale, allowing only positive values on the original scale; hence, the interpretation of the two asymptote parameters change from lower to upper and upper to lower, respectively, with a change from increasing to decreasing dose-response shapes.

Two separate curves are modelled for the two different herbicide treatments by assuming different upper asymptotes and ED50 parameters, defined by the curveid argument. The assay effects are modelled as random effects on the slope, ED50, and on both of the asymptotes; note that the random effects are also available for the lower asymptotes, which is fixed at 0.

Especially the random effects require a high number of parameters, which can be highly correlated; therefore, a higher number of iterations is needed for convergence.

spm <- standrm(formula=SLOPE ~ DOSE, data=spinach, 
               fct=LL.5(fixed=c(NA, NA, 0, NA, 0)),
               curveid=c + e ~ HERBICIDE,
               random=c + d + e ~ CURVE,
               iter=10000)

spm <- standrm(formula=SLOPE ~ DOSE, data=spinach, 
               fct=LL.5(fixed=c(NA, NA, 0, NA, 0)),
               curveid=c + e ~ HERBICIDE,
               random=c + d + e ~ CURVE,
               iter=10000)

plot(spm, ndose=50, logx=TRUE)

Generalized Dose-Response Models

For non-normal distributed responses the standrc package has several alternatives. Some parameters of the nonlinear model are estimated on a transformed scale, but the results are presented on the original scale of the data.

count data: standrm_count() assumes the observed counts to follow a Poisson distribution. The two asymptotes are modelled on the log scale.
binary data: standrm_binary() takes a vector of success/failure or true/false as the response, assuming a Bernoulli distribution. The asymptotes are modelled on the logit scale $\log(\frac{p}{1-p})$; predictions are made for the probability of observing a success.
binomial model: standrm_binomial() assumes a Binomial distribution for the number of successess given the total number of observed events. The input is similar to the glm function using a matrix with success and failure as a response (cbind(success, failure)); the model is then constructed similar to the binary case.

earthworms: Binomial example

The dataset is obtained from a toxicity test using earthworms, and it contains the number of earthworms remaining in a container that is contaminated with a toxic substance (not disclosed) and not migrating to the neighbouring uncontaminated container).

A four parameter Weibull model is assumed for the probability of observing an earthworm in the contaminated container. The number of remaining earthworms is assumed to follow a Binomial distribution, providing a matrix with remaining and migrated earthworms as a response. The lower and upper asymptotes are fixed at 1 and 0, assuming that all earthworms are in the contaminated container at dose 0 and all of them will be migrated at a high dose level.

ew <- standrm_binomial(formula=cbind(number, total-number) ~ dose, 
                       data=earthworms, 
                       fct=W1.4(fixed=c(NA, 1, 0, NA)), iter=5000)

ew <- standrm_binomial(formula=cbind(number, total-number) ~ dose, 
                       data=earthworms, 
                       fct=W1.4(fixed=c(NA, 1, 0, NA)), iter=5000)

plot(ew, logx=TRUE)

Watanabe-Akaike Information Criterion

In order to compare two different model fits the WAIC is implemented in the function waic() according to Vehtari and Gelman (2014). For more information see WAIC and cross-validation in STAN at http://www.stat.columbia.edu/~gelman/research/unpublished/.

Model averaging

Given a set of models the WAICs can be transformed into model weights $\omega_{i}$ by $$ \omega_{i} = \frac{\exp(-0.5 \Delta_{i})}{\sum_{i}\exp(-0.5 \Delta_{i})} \qquad \Delta_{i} = WAIC_{i} - \min_{i}(WAIC_{i})$$ By treating these model weights as model probabilities they can be used for model averaging. Given $B$ samples from the posterior distribution of the parameter of interest, the samples from the posterior of the model-averaged parameter consist of a random draw of $\omega_{i} \times B$ samples from the $i$th model in the set, treating $\omega_{i}$ as the probability that the $b$th posterior sample is obtained from the $i$th model.

The functions maED() and mapredict() can be used to calculate model-averaged effective dose parameters and model-averaged predictions, which can be directly plotted by function maplot(). All model-averaging functions use several standrc model objects as input together with the arguments, that are available in the single-model functions.

Sinapis alba example

Data are from an experiment, comparing the potency of the two herbicides glyphosate and bentazone in white mustard Sinapis alba.

Thre models are fitted:

a four-parameter log-logistic model with different asymptotes, slopes, and ED50 parameters for each herbicide
a four-parameter log-logistic model assuming the same asymptotes for both herbicides
a four-parameter Weibull model with the same asymptotes for both herbicides

m1 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + c + d + e ~ Herbicide, fct=LL.4(), iter=5000)
m2 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=LL.4(), iter=5000)
m3 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=W1.4(), iter=5000)

m1 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + c + d + e ~ Herbicide, fct=LL.4(), iter=5000)
m2 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=LL.4(), iter=5000)
m3 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=W1.4(), iter=5000)

The WAIC weights can be calculated by

modellist <- list(m1, m2, m3)
inc <- sapply(modellist, function(x) waic(x)$waic)
d <- inc-min(inc)
round(exp(-0.5*d)/sum(exp(-0.5*d)), 3)

which are used to present the following plot of the predicted, model-averaged curves.

maplot(m1, m2, m3, logx=TRUE, ndose=100)

E.g. the ED10 can be calculated by

maED(m1, m2, m3, respLev=10)

Informative prior distributions

Choosing prior distributions in an automated fashion has its limits, especially for nonlinear models. Prior means for the asymptotes are found empirically based on the data, assuming weakly informative prior distributions with very long tails.

In order to include reasonable informative priors in the model, a named list with stan code entries can be supplied with the argument priors=. This list is automatically constructed by function standrc_priors() with default values for each parameter in the model.

Let us assume we have some prior information about parameters in the ryegrass example that was introduced at the beginning of the vignette. Like for the default we assume normal distributions for the parameters, but specifying a mean of 0 and standard deviation of 2 (on the logarithmic scale) for the asymmetry parameter and means 0 and 10 and std.dev. of 1 for the asymptotes. The definition of numeric entries for the mean allows also to provide vectors of means for several curves. The priors for the variance parameters are defined accordingly.

rgp <- standrm(rootl ~ conc, data=ryegrass,
               fct=LL.5(fixed=c(NA, NA, NA, NA, NA)),
               priors=standrc_priors(f="normal(pf, 2)", pf=0,
                                     c="normal(pc, 1)", pc=10,
                                     d="normal(pd, 1)", pd=0),
               iter=5000)

library(standrc)
rgp <- standrm(rootl ~ conc, data=ryegrass,
               fct=LL.5(fixed=c(NA, NA, NA, NA, NA)),
               priors=standrc_priors(f="normal(pf, 2)", pf=0,
                                     c="normal(pc, 1)", pc=10,
                                     d="normal(pd, 1)", pd=0),
               iter=5000)

plot(rgp, logx=TRUE, ndose=100)

Model Definition

Dose-response functions

The five-parameter logistic model L.5() is defined as $$ f(dose, b, c, d, e, f) = c + \frac{d - c}{(1 + exp(-exp(b) * (dose - e)))^{exp(f)}} $$
or as log-logistic model LL.5() with the dose on a logarithmic scale $$ f(dose, b, c, d, e, f) = c + \frac{d - c}{1 + exp(-exp(b)(log(dose / e)))^{exp(f)}} $$
Two Weibull functions W1.4() and W2.4() are available $$ f(dose, b, c, d, e) = c + (d - c) exp(-exp(-exp(b) (log(dose) - log(e)))) $$ $$ f(dose, b, c, d, e) = c + (d - c) (1 - exp(-exp(-exp(b) (log(dose) - log(e))))) $$
The log-Normal function is defined as $$ f(dose, b, c, d, e) = c + (d - c) \Phi (-exp(b) (log(dose) - log(e))) $$ with $\Phi()$ denoting the standard normal cdf.

Fixed and random effects

Multiple fixed effects ($j=1,...,J$) and random effects $\tilde{b}{k}, \tilde{c}{k}, \tilde{d}{k}, \tilde{e}{k}, \tilde{f}{k}$ with $k=1,...,K$ are introduced as $$ b = b{j} + \tilde{b}{k}, \quad c = c{j} + \tilde{c}{k}, \quad d = d{j} + \tilde{d}{k}, \quad e = e{j} + \tilde{e}{k}, \quad f = f{j} + \tilde{f}_{k} $$

For each fixed effects parameter a weakly informative Normal prior is assumed: $$ b_{j} \sim N(0, 100), \quad c_{j} \sim N(\mu_{c}, 100), \quad d_{j} \sim N(\mu_{d}, 100), \quad e_{j} \sim N(\mu_{e}, 100), \quad f_{j} \sim N(0, 100) $$ where the some of the prior means are obtained from the sample of observation: $\mu_{c}$ and $\mu_{d}$ are defined as $\min(y_{i})$ and $\max(y_{i})$ at the lowest/highest dose level, $\mu_{e}=median(dose_{i})$.

The random effects are modelled by their marginal distributions as $$ \tilde{b}{k} \sim N(0, \sigma{b}), \quad \sigma_{b} \sim IG(0.001, 0.001) $$ $$ ... $$

For a Gaussian response the residuals in the model $$ y_{i} = f(dose_{i}, b, c, d, e, f) + \epsilon_{i} $$ are distributed as $$ \epsilon_{i} \sim N(0, \sigma_{\epsilon}), \quad \sigma_{\epsilon} \sim IG(0.001, 0.001) $$

For a binary response the model $$ \mu_{i} = f(dose_{i}, b, \mbox{inv_logit}(c), \mbox{inv_logit}(d), e, f) $$ with $$ y_{i} \sim Bernoulli(\mu_{i})$$ is assumed. The Binomial model is defined as $$ y_{i} \sim Binomial(N, \mu_{i}) $$ with $N$ denoting the total number of events, including the Bernoulli model with $N=1$.

When count data is observed, the model $$ \mu_{i} = f(dose_{i}, b, \exp(c), \exp(d), e, f) \quad \mbox{with} \quad y_{i} \sim Poisson(\mu_{i})$$ can be chosen.

standrc objects

The standrm function returns a list with following slots:

data: a list with data used to fit the stan model
model: a character string containing the model definition as stan code
stan: the fitted results as an object of S4 class stanfit

Hence, by using the data and model slots, the model can be refitted using the function stan. With the stanfit object in the stan slot, the samples from the posterior distribution of all parameters in the model are available. To extract the samples, the extract() function of rstan can be used. There is also a lot of functionality for model diagnostics available in the rstan package, which can be directly applied on the stanfit object.

daniel-gerhard/standrc documentation built on May 14, 2019, 3:39 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

daniel-gerhard/standrc
Dose-response curves with stan

In daniel-gerhard/standrc: Dose-response curves with stan

standrm

Ryegrass Example

standrc Methods

Deriving Effective Dose Parameters

Model Extensions

Spinach Example

Generalized Dose-Response Models

earthworms: Binomial example

Watanabe-Akaike Information Criterion

Model averaging

Sinapis alba example

Informative prior distributions

Model Definition

Dose-response functions

Fixed and random effects

standrc objects

R Package Documentation

Browse R Packages

We want your feedback!

daniel-gerhard/standrc Dose-response curves with stan

In daniel-gerhard/standrc: Dose-response curves with stan

standrm

Ryegrass Example

standrc Methods

Deriving Effective Dose Parameters

Model Extensions

Spinach Example

Generalized Dose-Response Models

earthworms: Binomial example

Watanabe-Akaike Information Criterion

Model averaging

Sinapis alba example

Informative prior distributions

Model Definition

Dose-response functions

Fixed and random effects

standrc objects

R Package Documentation

Browse R Packages

We want your feedback!

daniel-gerhard/standrc
Dose-response curves with stan