R/bc.mean.sd.R

Defines functions bc.mean.sd

Documented in bc.mean.sd

#' Box-Cox method for estimating the sample mean and standard deviation
#'
#' This function applies the Box-Cox (BC) method to estimate the sample mean and standard deviation from a study that presents one of the following sets of summary statistics: \itemize{
#' \item S1: median, minimum and maximum values, and sample size
#' \item S2: median, first and third quartiles, and sample size
#' \item S3: median, minimum and maximum values, first and third quartiles, and sample size
#'  }
#'
#' The BC method incorporates the Box-Cox power transformation into the sample mean estimators of Luo et al. (2016) and the sample standard deviation estimators of Wan et al. (2014) so that their assumption of normality is more tenable. The BC method consists of the following steps, outlined below.
#'
#' First, an optimal value of the power parameter \eqn{\lambda} is found so that the distribution of the Box-Cox transformed data is approximately normal. Then, the methods of Luo et al. and Wan et al. are applied to estimate the mean and standard deviation of the distribution of the transformed data. Finally, the inverse transformation is applied to estimate the sample mean and standard deviation of the original, untransformed data.
#'
#' To perform the inverse transformation, either numerical integration or Monte Carlo simulation can be applied, which is controlled by the \code{avoid.mc} argument. When the estimated mean of the Box-Cox transformed data is negative or close to zero (i.e., below 0.01), numerical integration often does not converge. Therefore, Monte Carlo simulation is automatically used in this case.
#'
#' @param min.val numeric value giving the sample minimum.
#' @param q1.val numeric value giving the sample first quartile.
#' @param med.val numeric value giving the sample median.
#' @param q3.val numeric value giving the sample third quartile.
#' @param max.val numeric value giving the sample maximum.
#' @param n numeric value giving the sample size.
#' @param preserve.tail logical scalar indicating whether to preserve or remove (if applicable) the negative-domain left suppport (and the corresponding right support to maintain the symmetry of the underlying normal distribution) of the Box-Cox cumulative distribution function. The classical Box-Cox transformation only takes positive numbers as input, so this parameter has a default value of \code{FALSE}. It is not possible to avoid Monte Carlo simulation when this parameter is set to \code{TRUE}. When this parameter is set to \code{TRUE}, the data-modeling distribution corresponding to the inverse Box-Cox transformation of the underlying normal distribution can have a value of infinity for its mean and/or variance. In this case, the average of the corresponding mean and/or variance produced by this function does not converge.
#' @param avoid.mc logical scalar indicating whether to avoid Monte Carlo simulation (if possible) when performing the inverse Box-Cox transformation (the default is \code{FALSE}). See 'Details'.
#'
#' @return A object of class \code{bc.mean.sd}. The object is a list with the following components:
#' \item{est.mean}{Estimated sample mean.}
#' \item{est.sd}{Estimated sample standard deviation.}
#' \item{location}{Estimated mean of the Box-Cox transformed data.}
#' \item{scale}{Estimated standard deviation of the Box-Cox transformed data.}
#' \item{shape}{Estimated transformation parameter \eqn{\lambda}.}
#' \item{bc.norm.rvs}{The random variables generated by the Box-Cox (or, equivalently, power-normal) distribution during the Monte Carlo simulation. If Monte Carlo simulation is not used, a value of \code{NA} is given.}
#' \item{...}{Some additional elements.}
#'
#' The results are printed with the \code{\link{print.bc.mean.sd}} function.
#'
#' @examples
#' ## Generate S2 summary data
#' set.seed(1)
#' n <- 100
#' x <- stats::rlnorm(n, 2.5, 1)
#' quants <- stats::quantile(x, probs = c(0.25, 0.5, 0.75))
#' obs.mean <- mean(x)
#' obs.sd <- stats::sd(x)
#'
#' ## Estimate the sample mean and standard deviation using the BC method
#' bc.mean.sd(q1.val = quants[1], med.val = quants[2], q3.val = quants[3],
#'     n = n)
#'
#' @references McGrath S., Zhao X., Steele R., Thombs B.D., Benedetti A., and the DEPRESsion Screening Data (DEPRESSD) Collaboration. (2020). Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. \emph{Statistical Methods in Medical Research}. \strong{29}(9):2520-2537.
#' @references Box G.E.P., and D.R. Cox. (1964). An analysis of transformations. \emph{Journal of the Royal Statistical Society Series B}. \strong{26}(2):211-52.
#' @references Luo D., Wan X., Liu J., and Tong T. (2016). Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. \emph{Statistical Methods in Medical Research}. \strong{27}(6):1785-805
#' @references Wan X., Wang W., Liu J., and Tong T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. \emph{BMC Medical Research Methodology}. \strong{14}:135.
#' @export

bc.mean.sd <- function(min.val, q1.val, med.val, q3.val, max.val, n,
                       preserve.tail = FALSE, avoid.mc = FALSE) {

  args <- as.list(environment())

  scenario <- get.scenario(min.val, q1.val, med.val, q3.val, max.val)
  check_errors(min.val = min.val, q1.val = q1.val, med.val = med.val,
               q3.val = q3.val, max.val = max.val, n = n, scenario = scenario)

  location.scale.shape <- get.location.scale.shape(min.val, q1.val, med.val,
                                                   q3.val, max.val, n, scenario)
  location <- location.scale.shape[1]
  scale    <- location.scale.shape[2]
  shape    <- location.scale.shape[3]

  if (shape > -1e-2 & !preserve.tail & avoid.mc) {
    if (shape > 1e-2) {
      lower <- min(-1 / shape, 2 * location + 1 / shape)
      upper <- max(-1 / shape, 2 * location + 1 / shape)
      normalizer <- 1 / (stats::pnorm(upper, location, scale) -
                           stats::pnorm(lower, location, scale))
      exp.val.integrand <- function(x) {
        normalizer * stats::dnorm(x, location, scale) *
          inv.smooth.bc.transform(x, shape)
      }
      exp.val.result <- stats::integrate(exp.val.integrand, lower, upper)
      exp.val <- exp.val.result$value
      var.val.integrand <- function(x) {
        normalizer * stats::dnorm(x, location, scale) *
          (inv.smooth.bc.transform(x, shape) - exp.val)^2
      }
      var.val.result <- stats::integrate(var.val.integrand, lower, upper)
      var.val <- var.val.result$value
    } else {
      exp.val <- exp(location + scale^2 / 2)
      var.val <- (exp(scale^2) - 1) * exp(2 * location + scale^2)
    }
    output <- list(est.mean = as.numeric(exp.val),
                   est.sd = as.numeric(sqrt(var.val)),
                   location = as.numeric(location), scale = as.numeric(scale),
                   shape = as.numeric(shape), bc.norm.rvs = NA)
  } else {
    trans.norm.rvs <- stats::rnorm(n+1e3, location, scale)
    if (!preserve.tail) {
      if (shape >= 1e-8) {
        sym.truncated.norm.rvs <- trans.norm.rvs[trans.norm.rvs >= -1 / shape &
                                                   trans.norm.rvs <= 2 *
                                                   location + 1 / shape]
      } else if (shape <= -1e-8) {
        sym.truncated.norm.rvs <- trans.norm.rvs[trans.norm.rvs <= -1 / shape &
                                                   trans.norm.rvs >= 2 *
                                                   location + 1 / shape]
      } else {
        sym.truncated.norm.rvs <- trans.norm.rvs
      }
      bc.norm.rvs <- inv.smooth.bc.transform(sym.truncated.norm.rvs, shape)
    } else {
      bc.norm.rvs <- inv.smooth.bc.transform(trans.norm.rvs, shape)
    }

    output <- list(est.mean = as.numeric(mean(bc.norm.rvs)),
                   est.sd = as.numeric(stats::sd(bc.norm.rvs)),
                   location = as.numeric(location), scale = as.numeric(scale),
                   shape = as.numeric(shape), bc.norm.rvs = bc.norm.rvs,
                   args = args, scenario = scenario)
  }
  class(output) <- "bc.mean.sd"
  return(output)
}

Try the estmeansd package in your browser

Any scripts or data that you put into this service are public.

estmeansd documentation built on June 19, 2022, 1:05 a.m.