R/stan_guts.R

#' @title Fits a GUTS model for survival analysis using Bayesian inference with \code{Stan}
#' 
#' @description This function estimates the parameters of a GUTS ('SD', 'IT' or 'PROPER')
#' model for survival analysis using Bayesian inference. In this model,
#' the survival rate of individuals is modeled as a function of the chemical compound
#' concentration with a mechanistic description of the effects on survival over
#' time.
#' 
#'   
#' @param data \code{data.frame} containing the following four columns:
#' \itemize{
#'   \item \code{replicate}: a vector of class \code{integer} or \code{factor} for replicate
#'     identification. A given replicate value should identify the same group of
#'     individuals followed in time
#'   \item \code{conc}: a vector of class \code{numeric} with tested concentrations
#'     (positive values, may contain NAs)
#'   \item \code{time}: a vector of class \code{integer} with time points, minimal value must be 0
#'   \item \code{Nsurv}: a vector of class \code{integer} providing the number of
#'     alive individuals at each time point for each concentration and each replicate
#'     (may contain NAs)
#' }
#' @param model_type The type of GUTS model: \code{'SD'}, \code{'IT'}
#'   or  \code{'PROPER'}.
#' @param distribution The type of distribution used for model \code{'IT'} or
#'   \code{'PROPER'}: either \code{"loglogistic"} or \code{"lognormal"}. The
#'   default is \code{"loglogistic"}.
#' @param ode_integrator A string \code{"rk45"} or \code{"bdf"} specifying
#'   the integrator for solving systems of differential equations (ODEs).
#'   The default is \code{"rk45"}.
#'   \code{"rk45"} is a fourth and fifth order Runge-Kutta method for non-stiff
#'   systems. \code{"bdf"}  is a variable-step, variable-order,
#'   backward-differentiation formula implementation for stiff systems.
#'   For further information, see the manual of language \code{Stan}.
#' @param rel_tol A double, inferior to 1, controlling the relative tolerance
#'   of the accuracy of the solutions generated by the integrator. This
#'   tolerance is relative to the solution value. The default is 1e-8.
#'   Smaller tolerance produces more accurante solutions, but may increase
#'   the computation time.
#' @param abs_tol A double, inferior to 1, controlling the absolute tolerance
#'   of the accuracy of the solutions generated by the integrator. This
#'   tolerance is the maximum absolute error allowed in a solution. The
#'   default is 1e-8. Smaller tolerance produces more accurante solutions,
#'   but may increase the computation time.
#' @param max_num_steps A double providing the maximum number of steps that
#'   can be used to stop a runaway simulation. This can arise in MCMC when
#'   a bad jump is taken, particularly during warmup.
#' @param priors_list A list of priors to change the priors automatically
#'   generated.
#' @param adapt_delta A double, between 0 and 1, controlling part of sampling 
#'   algorithms. See \code{control} in function \code{stan} of package \code{rstan}.
#'   The default is 0.95.
#' @param chains A positive integer specifying the number of Markov chains.
#'   The default is 3.
#' @param iter A positive integer specifying the number of iterations for each
#'   chain (including warmup). The default is 2000.
#' @param warmup A positive integer specifying the number of warmup (aka
#'   burnin) iterations per chain. If step-size adaptation is on (which it is
#'   by default), this also controls the number of iterations for which
#'   adaptation is run (and hence these warmup samples should not be used for
#'   inference). The number of warmup iterations should not be larger than iter
#'   and the default is 1000.
#' @param thin A positive integer specifying the period for saving samples.
#'   The default is 1, which is usually the recommended value.
#' @param \dots Further arguments to be passed in function \code{sampling} 
#'   of package \code{rstan}.  
#' 
#' 
#' @return The function returns an object of class \code{stanguts}, which is
#' a list with the following information:
#' \item{stanfit}{An object of S4 class \code{stanfit}. See the documentation in
#'   \code{rstan} package for further information. However, if \code{cores > 1}
#'   and there is an error for any of the chains, then the error(s) are printed. If
#'   all chains have errors and an error occurs before or during sampling, the returned
#'   object does not contain samples. But the compiled binary object for the 
#'   model is still included, so we can reuse the returned object for another
#'   sampling.}
#' \item{data}{the data.frame provided by the argument \code{data}.}
#' \item{dataStan}{a list of data passed in the Stan model object.}
#' \item{mcmcInfo}{a table with the number of iterations, chains, warmup and
#'   the thinning interval.} 
#' \item{model_type}{the type of GUTS model: \code{SD}, \code{IT} or \code{PROPER}.}
#' \item{distribution}{the type of distribution used for models \code{IT} or
#'   \code{PROPER}: either \code{"loglogistic"} or \code{"lognormal"}.}
#'
#' @references Jager, T., Albert, C., Preuss, T. G. and Ashauer, R. (2011) 
#' General unified threshold model of survival-a toxicokinetic-toxicodynamic
#'  framework for ecotoxicology, \emph{Environmental Science & Technology}, 45, 2529-2540.
#' 303-314.
#' 
#' Jager, T. and Ashauer, R. (2018) Modelling survival under chemical stress.
#'  A comprehensive guide to the GUTS framework. Version 1.0., \emph{Leanpub}.
#' 
#' @examples
#'
#' # (1) Load the survival data
#' data("data_Diazinon")
#'
#' \dontrun{
#' # (2) Run the stan_guts function with TK-TD model 'SD', 'IT', or 'PROPER' (and distribution) 
#' fit_SD_diaz <- stan_guts(data_Diazinon, model_type = "SD")
#'
#' # (3) Plot the fitted curve
#' plot_stanguts(fit_SD_diaz)
#'
#' # (4) Convert  'stanguts' object into 'stanfit' or 'survFit' to explore the fit with
#' # package 'rstan' or 'morse'.
#' stanfit_SD_diaz <- stanguts_to_stanfit(fit_SD_diaz)
#' survFit_SD_diaz <- stanguts_to_survFit(fit_SD_diaz) 
#' }
#' 
#' 
#' @export
#' 
stan_guts <- function(data,
                      # specific to stan_guts
                      model_type = NULL,
                      distribution = "loglogistic",
                      ode_integrator = "rk45", # other is "bdf" not yet implemented
                      rel_tol = 1e-8,
                      abs_tol = 1e-8,
                      max_num_steps = 1e3,
                      priors_list = NULL,
                      # rstan::sampling
                      adapt_delta = 0.95,
                      chains = 3,
                      iter = 2000,
                      warmup = 1000,
                      thin = 1,
                      ...){
  
  ### ensures model_type is one of "SD" and "IT"
  if(is.null(model_type) || ! (model_type %in% c("SD","IT", "PROPER"))) {
    stop("You need to specify a 'model_type' among 'SD', 'IT' or 'PROPER'.")
  }
  if(!(ode_integrator %in% c("rk45", "bdf"))){
    stop("You need to specify an 'ode_integrator' among 'rk45' and 'bdf'.")
  }
  if(!(distribution %in% c("loglogistic", "lognormal"))){
    stop("You need to specify a 'distribution' among 'loglogistic'
         or 'lognormal'.")
  }
  if(!(model_type %in% c("SD", "IT", "PROPER"))){
    stop("'model_type' must be 'SD', 'IT' or 'PROPER'. For 'IT' and 'PROPER' models,
                please add the distribution 'loglogistic' or 'lognormal'.")
  }
  
  ode_control <- list(rel_tol = rel_tol, abs_tol = abs_tol, max_num_steps = max_num_steps)

  dataStan_withReplicate <- modelDataStan(data, model_type, ode_control, priors_list)
  dataStan <- dataStan_withReplicate
  dataStan$replicate_conc <- NULL
  dataStan$replicate_Nsurv <- NULL
  dataStan$Ninit <- NULL
  
  if(distribution == "loglogistic"){
    dataStan$proper_distribution = 1
  }
  if(distribution == "lognormal"){
    dataStan$proper_distribution = 2
  }
  
  if(ode_integrator == "rk45"){
    if(model_type == "SD"){
      dataStan$proper_distribution <- NULL
      model_object <- stanmodels$ode_TKTD_varSD
    } 
    if(model_type == "IT"){
      model_object <- stanmodels$ode_TKTD_varIT
    } 
    if(model_type == "PROPER"){
      model_object <- stanmodels$ode_TKTD_varPROPER
    }
  }
  if(ode_integrator == "bdf"){
    stop("ode_integrator 'bdf' is not supported in the present version.")
  }
  
  fit <- rstan::sampling(
    object = model_object,
    data = dataStan,
    control = list(adapt_delta = adapt_delta),
    chains = chains,
    iter = iter,
    warmup = warmup,
    thin = thin,
    ...)
  
  ##
  ## MCMC information
  ## 
  mcmcInfo <- data.frame(n.iter = iter,
                        n.chains = chains,
                        thin.interval = thin,
                        n.warmup = warmup)
  
  ## OBJECT TO RETURN
  
  ls_out <- list(stanfit = fit,
                 data = data,
                 dataStan = dataStan_withReplicate,
                 mcmcInfo = mcmcInfo,
                 model_type = model_type,
                 distribution = distribution)
  
  class(ls_out) <- "stanguts"
  
  ## ------ WARNINGS
  
  out_rhat <- summary(ls_out$stanfit)$summary[, "Rhat"]
  
  if (!all(out_rhat < 1.1, na.rm = TRUE)){
    ##possibility to store warning in a 'warnings table'
    msg <- "*** Markov chains did not converge! Do not analyze results! ***. 
Plot MCMC chains and try the following options:
 (1) if one or more chain are a simple stable line, increase 'adapt_delta' (default is 0.95).
 (2) if the variabbility between chain is great, you can increase the number of iteration (default is 2000 iteration).
 (3) if 'Conditional_Psurv_hat' is greater than 1, the ODE integration is wrong. So you can reduce the tolerance of the ODE integrator."
    # warnings <- msgTableAdd(warnings, "rhat", msg)
    ## print the message
    warning(msg, call. = FALSE)
  }
  
  return(ls_out)
}
virgile-baudrot/rstanTKTD documentation built on May 29, 2019, 9:57 a.m.