The package standrc provides wrapper functions to fit dose-response models using STAN by a convenient formula interface. Of course, this results in a reduction of flexibility in setting up the model; hence, encouraging everyone to directly use rstan to fit a specific model of interest.
The function standrm() can be used similar as function drm() in package drc or function medrm() in package medrc.
The input is structured as
response ~ dose
)The ryegrass dataset of package drc provides a simple example, fitting a four parameter log-logistic model. The asymmetry parametr is fixed at 0, resulting in a symmetric curve, as it is defined on the log-scale, in contrast to the drc parameterization, where it is modelled without any tranformation.
rgm <- standrm(rootl ~ conc, data=ryegrass, fct=LL.5(fixed=c(NA, NA, NA, NA, 0)), iter=5000)
library(standrc) rgm <- standrm(rootl ~ conc, data=ryegrass, fct=LL.5(fixed=c(NA, NA, NA, NA, 0)), iter=5000)
The resulting object is a list containing the data, stan code, and the rstan results with the samples from the posterior distribution, which can be summarized using the print() method.
print(rgm)
Predictions for the dose-response curve can be obtained by the predict method, returning a matrix with the samples from the posterior distribution. These samples can be summarized e.g. by arithmetic means, standard deviations or quantiles.
newd <- expand.grid(conc=exp(seq(log(0.5), log(30), length=100))) pm <- predict(rgm, newdata=newd) newd$p <- apply(pm, 2, function(x) mean(x, na.rm=TRUE)) newd$pmin <- apply(pm, 2, function(x) quantile(x, 0.025, na.rm=TRUE)) newd$pmax <- apply(pm, 2, function(x) quantile(x, 0.975, na.rm=TRUE))
The predicted curves can be displayed by using the plot method.
plot(rgm, ndose=50, logx=TRUE)
Using 50 support points on the dose scale results in a nicely smooth curve, but increasing this number increases computation time for the predictions.
The print method returns a summary of the posterior samples of the fixed effects parameters. When random effects are assumed, the fixed effect samples are obtained conditional on a new, unknown random effect cluster. The fixef method returns a summary of the fixed effects parameters conditional on a random effect of 0. If no random effects are present, the print and fixef methods return similar results. The ranef method returns a summary of posterior samples of the random effects and the VarCorr method provides an overview of the samples of the corresponding variance components.
A fitted and a residual method is available to extract a matrix with samples from the posterior of the residuals and fitted values (different parameters in columns, samples in rows). Predictions (posterior samples) for a new dataset can be obtained using the predict method; when random effects are defined in the model, the predictons are generated for an unknown random effect cluster.
Similar to package drc a function ED()
is available to calculate the (relative) effective dose at specified response levels (between 0 and 100) given a standrc object. The output contains the median of the samples of the ED posterior distribution as well as 95\% credible intervals separately for each given response level and curveid. When random effects are included in the model (see the following Section) the ED is computed conditional on an unobserved random effect level.
ED(rgm, respLev=c(5, 50, 95))
The model can be extended by
b + d + e ~ cluster
, where the letters b to f are the model parameters as defined in the LL.5() function.The ...
argument can be used to supply further arguments to the function stan(), like the number of iterations iter
, burn-in samples warmup
, or thinning thin
.
Streibig et al. (1999) investigated the inhibition of photosynthesis in response to two synthetic photosystem II inhibitors, the herbicides diuron and bentazon. In an experiment, the effect of oxygen consumption of thylakoid membranes (chloroplasts) from spinach was measured after incubation with the synthetic inhibitors. Five assays, three treated with bentazon and two with diuron, were used. For each assay six increasing herbicide concentrations were applied together with a negative control, using different dose ranges on a logarithmic scale for the two treatments to encompass the whole dose-response range based on preliminary experiments.
A three-parameter log-logistic model is assumed, fixing the asymmetry parameter to obtain a symmetric curve and forcing the lower asymptote to 0 (the asymmetry parameter is modelled on a logarithmic scale). In contrast to the drc package, the slope parameter is also defined on a log-scale, allowing only positive values on the original scale; hence, the interpretation of the two asymptote parameters change from lower to upper and upper to lower, respectively, with a change from increasing to decreasing dose-response shapes.
Two separate curves are modelled for the two different herbicide treatments by assuming different upper asymptotes and ED50 parameters, defined by the curveid
argument. The assay effects are modelled as random effects on the slope, ED50, and on both of the asymptotes; note that the random effects are also available for the lower asymptotes, which is fixed at 0.
Especially the random effects require a high number of parameters, which can be highly correlated; therefore, a higher number of iterations is needed for convergence.
spm <- standrm(formula=SLOPE ~ DOSE, data=spinach, fct=LL.5(fixed=c(NA, NA, 0, NA, 0)), curveid=c + e ~ HERBICIDE, random=c + d + e ~ CURVE, iter=10000)
spm <- standrm(formula=SLOPE ~ DOSE, data=spinach, fct=LL.5(fixed=c(NA, NA, 0, NA, 0)), curveid=c + e ~ HERBICIDE, random=c + d + e ~ CURVE, iter=10000)
plot(spm, ndose=50, logx=TRUE)
For non-normal distributed responses the standrc package has several alternatives. Some parameters of the nonlinear model are estimated on a transformed scale, but the results are presented on the original scale of the data.
standrm_count()
assumes the observed counts to follow a Poisson distribution. The two asymptotes are modelled on the log scale. standrm_binary()
takes a vector of success/failure or true/false as the response, assuming a Bernoulli distribution. The asymptotes are modelled on the logit scale $\log(\frac{p}{1-p})$; predictions are made for the probability of observing a success. standrm_binomial()
assumes a Binomial distribution for the number of successess given the total number of observed events. The input is similar to the glm
function using a matrix with success and failure as a response (cbind(success, failure)
); the model is then constructed similar to the binary case.The dataset is obtained from a toxicity test using earthworms, and it contains the number of earthworms remaining in a container that is contaminated with a toxic substance (not disclosed) and not migrating to the neighbouring uncontaminated container).
A four parameter Weibull model is assumed for the probability of observing an earthworm in the contaminated container. The number of remaining earthworms is assumed to follow a Binomial distribution, providing a matrix with remaining and migrated earthworms as a response. The lower and upper asymptotes are fixed at 1 and 0, assuming that all earthworms are in the contaminated container at dose 0 and all of them will be migrated at a high dose level.
ew <- standrm_binomial(formula=cbind(number, total-number) ~ dose, data=earthworms, fct=W1.4(fixed=c(NA, 1, 0, NA)), iter=5000)
ew <- standrm_binomial(formula=cbind(number, total-number) ~ dose, data=earthworms, fct=W1.4(fixed=c(NA, 1, 0, NA)), iter=5000)
plot(ew, logx=TRUE)
In order to compare two different model fits the WAIC is implemented in the function waic()
according to Vehtari and Gelman (2014). For more information see WAIC and cross-validation in STAN at http://www.stat.columbia.edu/~gelman/research/unpublished/.
Given a set of models the WAICs can be transformed into model weights $\omega_{i}$ by $$ \omega_{i} = \frac{\exp(-0.5 \Delta_{i})}{\sum_{i}\exp(-0.5 \Delta_{i})} \qquad \Delta_{i} = WAIC_{i} - \min_{i}(WAIC_{i})$$ By treating these model weights as model probabilities they can be used for model averaging. Given $B$ samples from the posterior distribution of the parameter of interest, the samples from the posterior of the model-averaged parameter consist of a random draw of $\omega_{i} \times B$ samples from the $i$th model in the set, treating $\omega_{i}$ as the probability that the $b$th posterior sample is obtained from the $i$th model.
The functions maED()
and mapredict()
can be used to calculate model-averaged effective dose parameters and model-averaged predictions, which can be directly plotted by function maplot()
. All model-averaging functions use several standrc model objects as input together with the arguments, that are available in the single-model functions.
Data are from an experiment, comparing the potency of the two herbicides glyphosate and bentazone in white mustard Sinapis alba.
Thre models are fitted:
m1 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + c + d + e ~ Herbicide, fct=LL.4(), iter=5000) m2 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=LL.4(), iter=5000) m3 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=W1.4(), iter=5000)
m1 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + c + d + e ~ Herbicide, fct=LL.4(), iter=5000) m2 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=LL.4(), iter=5000) m3 <- standrm(DryMatter ~ Dose, data=S.alba, curveid=b + e ~ Herbicide, fct=W1.4(), iter=5000)
The WAIC weights can be calculated by
modellist <- list(m1, m2, m3) inc <- sapply(modellist, function(x) waic(x)$waic) d <- inc-min(inc) round(exp(-0.5*d)/sum(exp(-0.5*d)), 3)
which are used to present the following plot of the predicted, model-averaged curves.
maplot(m1, m2, m3, logx=TRUE, ndose=100)
E.g. the ED10 can be calculated by
maED(m1, m2, m3, respLev=10)
Choosing prior distributions in an automated fashion has its limits, especially for nonlinear models. Prior means for the asymptotes are found empirically based on the data, assuming weakly informative prior distributions with very long tails.
In order to include reasonable informative priors in the model, a named list with stan code entries can be supplied with the argument priors=
. This list is automatically constructed by function standrc_priors()
with default values for each parameter in the model.
Let us assume we have some prior information about parameters in the ryegrass example that was introduced at the beginning of the vignette. Like for the default we assume normal distributions for the parameters, but specifying a mean of 0 and standard deviation of 2 (on the logarithmic scale) for the asymmetry parameter and means 0 and 10 and std.dev. of 1 for the asymptotes. The definition of numeric entries for the mean allows also to provide vectors of means for several curves. The priors for the variance parameters are defined accordingly.
rgp <- standrm(rootl ~ conc, data=ryegrass, fct=LL.5(fixed=c(NA, NA, NA, NA, NA)), priors=standrc_priors(f="normal(pf, 2)", pf=0, c="normal(pc, 1)", pc=10, d="normal(pd, 1)", pd=0), iter=5000)
library(standrc) rgp <- standrm(rootl ~ conc, data=ryegrass, fct=LL.5(fixed=c(NA, NA, NA, NA, NA)), priors=standrc_priors(f="normal(pf, 2)", pf=0, c="normal(pc, 1)", pc=10, d="normal(pd, 1)", pd=0), iter=5000)
plot(rgp, logx=TRUE, ndose=100)
L.5()
is defined as
$$ f(dose, b, c, d, e, f) = c + \frac{d - c}{(1 + exp(-exp(b) * (dose - e)))^{exp(f)}} $$LL.5()
with the dose on a logarithmic scale
$$ f(dose, b, c, d, e, f) = c + \frac{d - c}{1 + exp(-exp(b)(log(dose / e)))^{exp(f)}} $$W1.4()
and W2.4()
are available
$$ f(dose, b, c, d, e) = c + (d - c) exp(-exp(-exp(b) (log(dose) - log(e)))) $$
$$ f(dose, b, c, d, e) = c + (d - c) (1 - exp(-exp(-exp(b) (log(dose) - log(e))))) $$Multiple fixed effects ($j=1,...,J$) and random effects $\tilde{b}{k}, \tilde{c}{k}, \tilde{d}{k}, \tilde{e}{k}, \tilde{f}{k}$ with $k=1,...,K$ are introduced as $$ b = b{j} + \tilde{b}{k}, \quad c = c{j} + \tilde{c}{k}, \quad d = d{j} + \tilde{d}{k}, \quad e = e{j} + \tilde{e}{k}, \quad f = f{j} + \tilde{f}_{k} $$
For each fixed effects parameter a weakly informative Normal prior is assumed: $$ b_{j} \sim N(0, 100), \quad c_{j} \sim N(\mu_{c}, 100), \quad d_{j} \sim N(\mu_{d}, 100), \quad e_{j} \sim N(\mu_{e}, 100), \quad f_{j} \sim N(0, 100) $$ where the some of the prior means are obtained from the sample of observation: $\mu_{c}$ and $\mu_{d}$ are defined as $\min(y_{i})$ and $\max(y_{i})$ at the lowest/highest dose level, $\mu_{e}=median(dose_{i})$.
The random effects are modelled by their marginal distributions as $$ \tilde{b}{k} \sim N(0, \sigma{b}), \quad \sigma_{b} \sim IG(0.001, 0.001) $$ $$ ... $$
For a Gaussian response the residuals in the model $$ y_{i} = f(dose_{i}, b, c, d, e, f) + \epsilon_{i} $$ are distributed as $$ \epsilon_{i} \sim N(0, \sigma_{\epsilon}), \quad \sigma_{\epsilon} \sim IG(0.001, 0.001) $$
For a binary response the model $$ \mu_{i} = f(dose_{i}, b, \mbox{inv_logit}(c), \mbox{inv_logit}(d), e, f) $$ with $$ y_{i} \sim Bernoulli(\mu_{i})$$ is assumed. The Binomial model is defined as $$ y_{i} \sim Binomial(N, \mu_{i}) $$ with $N$ denoting the total number of events, including the Bernoulli model with $N=1$.
When count data is observed, the model $$ \mu_{i} = f(dose_{i}, b, \exp(c), \exp(d), e, f) \quad \mbox{with} \quad y_{i} \sim Poisson(\mu_{i})$$ can be chosen.
The standrm function returns a list with following slots:
stanfit
Hence, by using the data and model slots, the model can be refitted using the function stan. With the stanfit object in the stan slot, the samples from the posterior distribution of all parameters in the model are available. To extract the samples, the extract() function of rstan can be used. There is also a lot of functionality for model diagnostics available in the rstan package, which can be directly applied on the stanfit object.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.