R/bootstrap.R

Defines functions get_jackknife_values bootstrap_f2

Documented in bootstrap_f2 get_jackknife_values

#' Bootstrap f2
#'
#' The function \code{bootstrap_f2()} generates \code{R} bootstrap replicates
#' of the similarity factor \eqn{f_2} based on resampling of complete profiles
#' (nonparametric bootstrap) or on resampling per time point the values
#' between profiles (parametric bootstrap). Estimates of \dQuote{normal},
#' \dQuote{basic}, \dQuote{student}, \dQuote{percent} and of
#' \dQuote{bias-corrected, accelerated} (BCa) percentile intervals are returned.
#'
#' @param data A data frame with the dissolution profile data in wide format.
#' @param tcol A vector of indices specifying the columns in \code{data} that
#'   contain the \% release values. The length of \code{tcol} must be three or
#'   longer.
#' @param grouping A character string specifying the column in \code{data} that
#'   contains the group names (i.e. a factorial variable, e.g., for the
#'   differentiation of batches or formulations of a drug product).
#' @param rand_mode A character string indicating if complete profiles shall be
#'   randomised (\code{"complete"}, the default) or individual data points
#'   (\code{"individual"}).
#' @param R An integer specifying the number of bootstrap replicates. The
#'   default is \code{999}.
#' @param each An integer specifying the number of dissolution profiles to be
#'   selected per group per randomisation round. The default is \code{12}.
#' @param new_seed An integer for setting the seed for random number generation.
#'   The default is \code{100}.
#' @param confid A numeric value between 0 and 1 specifying the confidence limit
#'   for the calculation of the bootstrap confidence intervals. The default is
#'   \code{0.9}.
#' @param use_EMA A character string indicating if the similarity factor
#'   \eqn{f_2} should be calculated according to the EMA guideline \dQuote{On
#'   the investigation of bioequivalence} (\code{"yes"}) or not (\code{"no"},
#'   the default). The default is \code{"no"} because the bootstrap \eqn{f_2}
#'   method is one of the possible solutions if the condition concerning the
#'   variability between the profiles does not allow the evaluation of \eqn{f_2}
#'   according to the EMA guideline. A third option is \code{"ignore"}. If
#'   \code{use_EMA} is \code{"yes"} or \code{"no"} the appropriate profile
#'   portion is determined on the basis of the values of the parameter
#'   \code{bounds}. If it is \code{"ignore"}, the complete profiles are used as
#'   specified by the parameter \code{tcol}.
#' @param bounds A numeric vector of the form \code{c(lower, upper)} specifying
#'   the \dQuote{lower} and \dQuote{upper} limits, respectively, for the \%
#'   drug release given that \code{use_EMA} is \code{"no"}. The default is
#'   \code{c(1, 85)}. Mean \% release values of any of the two groups being
#'   compared that are smaller than or equal to the lower bound are ignored and
#'   only the first mean \% release value that is greater than or equal to the
#'   upper bound is included while all the subsequent values are ignored. If
#'   \code{use_EMA} is \code{"yes"} the \code{bounds} are \code{c(1, 85)} per
#'   definition.
#' @param ... Named parameters of the functions \code{stat.fun()},
#'   \code{ran.fun()} and \code{boot()}.
#'
#' @details Information on \eqn{f_2} can be found in at least three FDA
#' guidances and in the guideline of the European Medicines Agency (EMA)
#' \dQuote{On the investigation of bioequivalence} (EMA 2010). For the
#' assessment of the similarity of dissolution profiles using the similarity
#' factor \eqn{f_2} according to the EMA guideline the following constraints
#' do apply:
#' \enumerate{
#' \item A minimum of three time points (without zero) are necessary.
#' \item The time points should be the same for the two formulations.
#' \item For every time point and for each formulation at least 12 data points
#'   are required.
#' \item A maximum of one mean value per formulation may be > 85\% dissolved.
#' \item The coefficient of variation (\%CV) should be < 20\% for the first
#'   time point and < 10\% from the second to the last time point for any
#'   formulation.
#' }
#'
#' Dissolution profiles are regarded as similar if the \eqn{f_2} value is
#' between 50 and 100. \cr
#'
#' One often encountered problem is that the \%CV constraint cannot be
#' fulfilled. One possibility in this situation is the use of the bootstrap
#' \eqn{f_2} method (Shah 1998) by which the distribution of \eqn{f_2} is
#' simulated to obtain an unbiased estimate of the expected value of \eqn{f_2}
#' and the variability of the underlying distribution. For the \eqn{f_2}
#' calculation only those parts of the profiles are taken into account where
#' the means (per formulation) are \eqn{> d}\% dissolved (e.g., \eqn{d = 1})
#' and a maximum of one mean value per formulation is \eqn{> 85}\% dissolved.
#' In the literature it is suggested to make use of the lower 90\% bias
#' corrected and accelerated (BCa) confidence interval (CI) limit to come to
#' a decision in terms of similarity (Stevens (2015)).
#'
#' @return An object of class \sQuote{\code{bootstrap_f2}} is returned,
#' containing the following list elements:
#' \item{Boot}{An object of class \sQuote{\code{boot}} with the corresponding
#'   components.}
#' \item{Profile.TP}{A named numeric vector of the columns in \code{data}
#'   specified by \code{tcol} and depending on the selection of \code{use_EMA}.
#'   Given that the column names contain extractable numeric information,
#'   e.g., specifying the testing time points of the dissolution profile, it
#'   contains the corresponding values. Elements where no numeric information
#'   could be extracted are \code{NA}.}
#' \item{L}{A vector of the Jackknife leave-one-out-values.}
#' \item{CI}{An object of class \sQuote{\code{bootci}} which contains the
#'   intervals.}
#' \item{BCa_CI}{The lower and upper limits of the BCa interval calculated
#'   by the \code{boot.ci()} function from the \sQuote{\code{boot}} package.}
#' \item{ShahBCa_CI}{The lower and upper limits of the BCa interval calculated
#'   according to Shah (Shah 1998).}
#'
#' @references
#' United States Food and Drug Administration (FDA). Guidance for industry:
#' dissolution testing of immediate release solid oral dosage forms. 1997.\cr
#' \url{https://www.fda.gov/media/70936/download}
#'
#' United States Food and Drug Administration (FDA). Guidance for industry:
#' immediate release solid oral dosage form: scale-up and post-approval
#' changes, chemistry, manufacturing and controls, \emph{in vitro} dissolution
#' testing, and \emph{in vivo} bioequivalence documentation (SUPAC-IR). 1995.\cr
#' \url{https://www.fda.gov/media/70949/download}
#'
#' European Medicines Agency (EMA), Committee for Medicinal Products for
#' Human Use (CHMP). Guideline on the Investigation of Bioequivalence. 2010;
#' CPMP/EWP/QWP/1401/98 Rev. 1.\cr
#' \url{https://www.ema.europa.eu/en/documents/scientific-guideline/
#' guideline-investigation-bioequivalence-rev1_en.pdf}
#'
#' Stevens, R. E., Gray, V., Dorantes, A., Gold, L., and Pham, L. Scientific
#' and regulatory standards for assessing product performance using the
#' similarity factor, \eqn{f_2}. \emph{AAPS Journal}. 2015; \strong{17}(2):
#' 301-306.\cr
#' \doi{10.1208/s12248-015-9723-y}
#'
#' Shah, V. P., Tsong, Y., Sathe, P., and Liu, J. P. \emph{In vitro} dissolution
#' profile comparison - statistics and analysis of the similarity factor,
#' \eqn{f_2}. \emph{Pharm Res}. 1998; \strong{15}(6): 889-896.\cr
#' \doi{10.1023/A:1011976615750}
#'
#' @seealso \code{\link{get_jackknife_values}}, \code{\link[boot]{boot}},
#'   \code{\link[boot]{boot.ci}}, \code{\link{mimcr}}, \code{\link{mztia}}.
#'
#' @importFrom stats qnorm
#' @importFrom stats pnorm
#' @importFrom stats quantile
#' @importFrom boot boot
#' @importFrom boot boot.ci
#'
#' @export

bootstrap_f2 <- function(data, tcol, grouping, rand_mode = "complete",
                   R = 999, each = 12, new_seed = 100, confid = 0.9,
                   use_EMA = "no", bounds = c(1, 85), ...) {
  if (!is.data.frame(data)) {
    stop("The data must be provided as data frame.")
  }
  if (!is.numeric(tcol) | length(tcol) < 3) {
    stop("The parameter tcol must be an integer vector of at least length 3.")
  }
  if (!isTRUE(all.equal(tcol, as.integer(tcol)))) {
    stop("The parameter tcol must be an integer vector.")
  }
  if (min(tcol) < 1 | max(tcol) > ncol(data)) {
    stop("Some columns specified by tcol were not found in data frame.")
  }
  if (sum(vapply(data[, tcol], is.numeric, logical(1))) != length(tcol)) {
    stop("Some columns specified by tcol are not numeric.")
  }
  if (!is.character(grouping)) {
    stop("The parameter grouping must be string.")
  }
  if (!(grouping %in% colnames(data))) {
    stop("The grouping variable was not found in the provided data frame.")
  }
  if (!is.factor(data[, grouping])) {
    stop("The grouping variable's column in data must be a factor.")
  }
  if (!(rand_mode %in% c("complete", "individual"))) {
    stop("Please specify rand_mode either as \"complete\" or \"individual\".")
  }
  if (!is.numeric(R) | length(R) > 1) {
    stop("The parameter R must be an integer of length 1.")
  }
  if (!isTRUE(all.equal(R, as.integer(R)))) {
    stop("The parameter R must be an integer of length 1.")
  }
  if (!is.numeric(each) | length(each) > 1) {
    stop("The parameter each must be an integer of length 1.")
  }
  if (!isTRUE(all.equal(each, as.integer(each)))) {
    stop("The parameter each must be an integer of length 1.")
  }
  if (!is.numeric(new_seed) | length(new_seed) > 1) {
    stop("The parameter new_seed must be an integer of length 1.")
  }
  if (!isTRUE(all.equal(new_seed, as.integer(new_seed)))) {
    stop("The parameter new_seed must be an integer of length 1.")
  }
  if (confid <= 0 | confid > 1) {
    stop("Please specify confid as (0, 1]")
  }
  if (!(use_EMA %in% c("yes", "no", "ignore"))) {
    stop("Please specify use_EMA either as \"yes\" or \"no\" or \"ignore\".")
  }
  if (!is.numeric(bounds) | length(bounds) != 2) {
    stop("The paramter bounds must be a numeric vector of length 2.")
  }
  if (bounds[1] > bounds[2]) {
    stop("Please specify bounds in the form c(lower limit, upper limit).")
  }
  if (bounds[1] < 0 | bounds[2] > 100) {
    stop("Please specify bounds in the range [0, 100].")
  }

  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # Data preparation

  data <- droplevels(data)

  if (nlevels(data[, grouping]) != 2) {
    stop("The number of levels in column ", grouping, " differs from 2.")
  }

  # <-><-><-><->
  # Extraction of information on the time points
  time_points <- get_time_points(svec = colnames(data)[tcol])

  # <-><-><-><->
  # Generation of logical vector representing the reference and test group
  b1 <- make_grouping(data = data, grouping = grouping)

  # Check if the two groups have the same number of observations and if the
  # parameter "each" is a multiple of the number of observations per group.
  # If the parameter 'each' is not a mulitple of the number of dissolution
  # profiles of the two groups or if the two groups do not have the same number
  # of observations, adjust the data frame. Do the adjustment in such a way that
  # both groups to be compared will have the same number of observations and
  # that the number of observations per group corresponds to the largest common
  # value (lcv) between 'each' and the number of observations per group.
  # Note: an alternative would be to use the least common multiple between the
  #       three numbers.
  # Note that together with the data adjustment the b1 vector must be reset.

  if (2 * each > nrow(data) | sum(b1) != sum(!b1)) {
    warning("The two groups to be compared do not have the same number of ",
            "observations. Thus, the number of rows is adjusted according ",
            "to the largest common value between the parameter each and ",
            "the number of observations per group.")

    data <- balance_observations(data = data, groups = b1, n_obs = each)
    b1 <- make_grouping(data = data, grouping = grouping)
  }

  # <-><-><-><->
  # Determination of dissolution profile ranges to be compared
  ok <- get_profile_portion(data = data, tcol = tcol, groups = b1,
                            use_EMA = use_EMA, bounds = bounds)
  time_points <- time_points[ok]

  if (use_EMA == "yes" & sum(ok) < 3) {
    stop("According to EMA the profiles must comprise a minimum of 3 time ",
         "points. The actual profiles comprise ", sum(ok), " points only.")
  }
  if (sum(ok) < 3) {
    warning("The profiles should comprise a minimum of 3 time points. ",
            "The actual profiles comprise ", sum(ok), " points only.")
  }

  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # Bootstrap assessment

  set.seed(new_seed)
  switch(rand_mode, "complete" = {
    t_boot <- boot(data = data, statistic = get_f2, R = R,
                   strata = data[, grouping], grouping = grouping,
                   tcol = tcol[ok], ...)
  }, "individual" = {
    mle <- vector(mode = "list", length = 2)
    mle[[1]] <- nrow(data) / 2
    mle[[2]] <- tcol[ok]
    t_boot <- boot(data = data, statistic = get_f2, R = R,
                   sim = "parametric", ran.gen = rand_indiv_points,
                   mle = mle, grouping = grouping, tcol = tcol[ok],
                   ins = seq_along(b1), ...)
  })

  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # Bootstrap confidence intervals

  jack <- get_jackknife_values(grouping = grouping, stat_fun = get_f2,
                               data = data, tcol = tcol[ok])

  cis <- suppressWarnings(boot.ci(t_boot, conf = confid, type = "all",
                                        L = jack$loo.values))

  # Shah's BCa corrected confid * 100% CI (see Shah et al., Pharm Res, 1998)
  a <- (sum((jack$theta.jack - jack$loo.values)^3)) /
    (sum(b1) / 2 *
       sum((jack$theta.jack - jack$loo.values)^2)^(3 / 2))
  z0 <- qnorm(sum(t_boot$t < t_boot$t0) / R)

  BCa_LL <- pnorm(z0 + (z0 + qnorm((1 - confid) / 2)) /
                    (1 - a * (z0 + qnorm((1 - confid) / 2))))
  BCa_UL <- pnorm(z0 + (z0 + qnorm(1 - (1 - confid) / 2)) /
                    (1 - a * (z0 + qnorm(1 - (1 - confid) / 2))))
  ShahBCa_CI <- as.numeric(c(quantile(t_boot$t, BCa_LL),
                             quantile(t_boot$t, BCa_UL)))

  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # Compilation of results

  structure(list(Boot = t_boot,
                 Profile.TP = time_points,
                 L = jack$loo.values,
                 CI = cis,
                 BCa_CI = as.numeric(cis$bca[4:5]),
                 ShahBCa_CI = ShahBCa_CI),
            class = "bootstrap_f2")
}

#' Jackknife values
#'
#' The function \code{get_jackknife_values()} is required for generation of
#' vector \eqn{L} for the BCa confidence interval estimation by the
#' \code{boot.ci()} function from the \sQuote{\code{boot}} package.
#'
#' @param stat_fun The name of the function calculating the statistic of
#'   interest. The statistic function must return a single number.
#' @param data A data frame with the dissolution profile data in wide format
#'   and a column for the distinction of the groups to be compared.
#' @param ... Further arguments that may be passed down to the statistics
#'   function.
#' @inheritParams bootstrap_f2
#'
#' @details For the calculation of BCa bootstrap confidence intervals empirical
#' influence values of the statistic of interest for the observed data are
#' required. If \eqn{L} is not provided to the \code{boot.ci()} function from
#' the \sQuote{\code{boot}} package, the values are calculated, if needed,
#' using the \code{empinf()} function (also from the \sQuote{\code{boot}}
#' package). Since the results returned by \code{empinf()} are not appropriate
#' in this case, the \eqn{L} values are determined using
#' \code{get_jackknife_values()}. The jackknife calculations are done by
#' calculating the leave-one-out estimates.
#'
#' @return A list of the jackknife assessment with the following elements is
#' returned:
#' \item{theta.hat}{The original value of the statistic.}
#' \item{theta.jack}{The jackknife value of the statistic.}
#' \item{jack.se}{The jackknife standard error of the statistic.}
#' \item{jack.bias}{The bias of the statistic.}
#' \item{loo.values}{The leave-one-out values.}
#' \item{pseudo.values}{The pseudo values.}
#'
#' @references
#' Chen, H. Bootstrap and Jackknife Calculations in R. 2004.
#'
#' @seealso \code{\link{bootstrap_f2}}, \code{\link[boot]{boot}},
#'   \code{\link[boot]{boot.ci}}.
#'
#' @keywords internal

get_jackknife_values <- function(grouping, stat_fun, data, ...) {
  if (!is.data.frame(data)) {
    stop("The data must be provided as data frame.")
  }
  if (!(grouping %in% colnames(data))) {
    stop("The grouping variable was not found in the provided data frame.")
  }

  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  n <- nrow(data) / 2

  l_index <- lapply(1:n, function(x) (1:(2 * n))[-c(x, (x + n))])
  u <- vapply(l_index, function(x)
    stat_fun(data = data, ins = x, grouping = grouping, ...),
    numeric(1))

  theta_hat <- stat_fun(data = data, ins = 1:(2 * n), grouping = grouping, ...)
  pseudo_values <- n * theta_hat - (n - 1) * u
  theta_jack <- mean(pseudo_values)
  jack_se <- sqrt(sum((pseudo_values - theta_jack)^2) / (n * (n - 1)))
  jack_bias <- (n - 1) * (theta_hat - mean(u))

  return(list(theta.hat = theta_hat,
              theta.jack = theta_jack,
              jack.se = jack_se,
              jack.bias = jack_bias,
              loo.values = u,
              pseudo.values = pseudo_values))
}

Try the disprofas package in your browser

Any scripts or data that you put into this service are public.

disprofas documentation built on Dec. 8, 2021, 5:10 p.m.