R/find_boot_ss.R
In qfasar: Quantitative Fatty Acid Signature Analysis in R

Documented in find_boot_ss

#' Find realistic bootstrap sample sizes
#'
#' In QFASA simulation studies, predator signatures are often simulated by
#' bootstrap resampling prey signature data. \code{\link{find_boot_ss}} finds
#' bootstrap sample sizes for each prey type that produce simulated predator
#' signatures with realistic levels of variation.
#'
#' @param pred_sigs A vector or matrix of predator signatures, intended to be
#'   the object \code{pred_sigs} returned by a call to the function
#'   \code{\link{est_diet}}.
#' @param pred_diets A numeric matrix of the estimated diet compositions of
#'   individual predators, intended to be the object \code{est_ind} returned by
#'   a call to \code{\link{est_diet}}.
#' @param prey_sigs A matrix of prey signatures, intended to be the object
#'   \code{prey_sigs} returned by a call to the function \code{\link{est_diet}}.
#' @param prey_loc A matrix giving the first and last locations of the
#'   signatures of each prey type within \code{prey_sigs}, intended to be the
#'   object \code{loc} returned by a call to the function
#'   \code{\link{prep_sig}} with the prey data frame, or by a call to the
#'   function \code{\link{make_prey_part}} if a partitioned prey library was
#'   used for diet estimation.
#' @param n_pred_boot An integer designating the number of predator signatures
#'   to bootstrap. See Details. Default value 1000.
#'
#' @return A list containing the following elements: \describe{
#'   \item{var_diet}{A numeric vector of the variance between the estimated
#'   diets of all possible pairs of predators, sorted in increasing order.}
#'   \item{var_sig}{A numeric vector of the variance between the signatures
#'   of all possible pairs of predators, sorted consistently with
#'   \code{var_diet}}.
#'   \item{var_sig_smooth}{A loess-smoothed version of var_sig.}
#'   \item{mod_sig_var}{A numeric vector of the modeled variance between
#'   bootstrapped predator signatures at each iteration of the algorithm.}
#'   \item{var_target}{The target level of variance between bootstrapped
#'   predator signatures.}
#'   \item{boot_ss}{An integer vector of bootstrap sample sizes for each prey
#'   type.}
#'   \item{err_code}{An integer error code (0 if no error is detected).}
#'   \item{err_message}{A string containing a brief summary of the results.}
#' }
#'
#' @section Details:
#' QFASA simulation studies may require the generation of predator signatures
#' given a specified diet, against which estimates of diet composition can then
#' be compared (e.g., Bromaghin et al. 2015). Given a specified diet, a
#' bootstrap sample of each prey type is drawn and mean prey-type signatures
#' are computed.  A predator signature is then generated by multiplying the mean
#' bootstrapped prey signatures by the diet proportions.
#'
#' Although authors often fail to report the bootstrap sample sizes used for
#' each prey type when describing simulations (e.g., Haynes et al. 2015;
#' Thiemann et al. 2008; Wang et al. 2010), they are subjectively selected
#' (e.g., Iverson et al. 2004; Bromaghin et al. 2015). However, Bromaghin et al.
#' (2016) found that bootstrap sample sizes strongly influence both the bias and
#' the variance of diet composition estimates in simulation studies.
#' Consequently, the selection of bootstrap sample sizes is an important aspect
#' of simulation design.
#'
#' Bromaghin (2015) presented an objective method of establishing a bootstrap
#' sample size for each prey type that produces simulated predator signatures
#' having a realistic level of between-signature variation.  The function
#' \code{\link{find_boot_ss}} implements the algorithm of Bromaghin (2015). A
#' brief summary of the algorithm, sufficient to understand the objects returned
#' by \code{find_boot_ss}, follows below. Please refer to Bromaghin (2015) for
#' additional details.
#'
#' The concept underlying the algorithm is that the variation in predator
#' signatures can be partitioned into variation due to differences in diet and
#' variation due to prey animals consumed given diet. Consequently, a realistic
#' level of variation between signatures for predators sharing the same diet is
#' approximated from the empirical relationship between a measure of variation
#' between pairs of predator signatures \code{var_sig} and a measure of
#' variation between the estimated diets \code{var_diets} of the same predator
#' pairs. As the variance in diets approaches zero, the predators are
#' effectively eating the same diet and variation in their signatures therefore
#' approaches the level of variation due to prey selection only.
#' \code{\link{find_boot_ss}} models the relationship between \code{var_diet}
#' and \code{var_sig} using a loess smooth, and the modeled signature variance
#' \code{var_sig_smooth} for the pair of predators whose value of
#' \code{var_diet} is smallest is taken as the target level of variation for
#' predator signatures.
#'
#' The algorithm is initialized with a sample size of 1 from each prey type.
#' A sample of \code{n_pred_boot} predator signatures is generated using those
#' sample sizes and the measure of variance among the signatures is computed.
#' If the variance measure exceeds the target level, the prey type contributing
#' most to the variance measure is identified and its sample size is increased
#' by 1. The algorithm then iterates, increasing the sample size by one at each
#' iteration, until the measure of variation is less than the target level. The
#' level of variation at each iteration and the target level of variation are
#' returned in the objects \code{mod_sig_var} and \code{var_terget},
#' respectively.
#'
#' The argument \code{n_boot_pred} should be large enough to return an estimate
#' of the variance measure that itself has low variance, so that the algorithm
#' returns numerically stable results. We suspect that the default value of 1000
#' errs on the side of caution.
#'
#' NOTE: Because \code{find_boot_ss} is intended to operate on the predator and
#' prey signatures returned by a call to the function \code{\link{est_diet}},
#' \code{find_boot_ss} can be based on diet estimates obtained in either the
#' predator or prey space, using an original or partitioned prey library.
#' However, it is imperative that the arguments are compatible.
#'
#' @section References:
#' Bromaghin, J.F. 2015. Simulating realistic predator signatures in
#'   quantitative fatty acid signature analysis. \emph{Ecological Informatics}
#'   30:68-71.
#'
#' Bromaghin, J.F., S.M. Budge, and G.W. Thiemann. 2016. Should fatty acid
#'   signature proportions sum to 1 for diet estimation? \emph{Ecological
#'   Research} 31:597-606.
#'
#' Bromaghin, J.F., K.D. Rode, S.M. Budge, and G.W. Thiemann. 2015. Distance
#'   measures and optimization spaces in quantitative fatty acid signature
#'   analysis. \emph{Ecology and Evolution} 5:1249-1262.
#'
#' Haynes, T.B., J.A. Schmutz, J.F. Bromaghin, S.J. Iverson, V.M. Padula, and
#'   A.E. Rosenberger. 2015. Diet of yellow-billed loons (\emph{Gavia adamsii})
#'   in Arctic lakes during the nesting season inferred from fatty acid
#'   analysis. \emph{Polar Biology} 38:1239-1247.
#'
#' Iverson, S.J., C. Field, W.D. Bowen, and W. Blanchard. 2004.
#'   Quantitative fatty acid signature analysis: A new method of
#'   estimating predator diets. \emph{Ecological Monographs} 74:211-235.
#'
#' Thiemann, G.W., S.J. Iverson, and I. Stirling. 2008. Polar bear diets and
#'   Arctic marine food webs: insights from fatty acid analysis.
#'   \emph{Ecological Monographs} 78:591-613.
#'
#' Wang, S.W., T.E. Hollmen, and S.J. Iverson. 2010. Validating quantitative
#'   fatty acid signature analysis to estimate diets of spectacled and Stellers
#'   eiders (\emph{Somateria fischeri} and \emph{Polysticta stelleri}).
#'   \emph{Journal of Comparative Physiology B} 180:125-139.
#'
#' @examples
#' find_boot_ss(pred_sigs = matrix(c(0.05, 0.10, 0.30, 0.55,
#'                                   0.04, 0.11, 0.29, 0.56,
#'                                   0.10, 0.05, 0.35, 0.50,
#'                                   0.12, 0.03, 0.37, 0.48,
#'                                   0.10, 0.06, 0.35, 0.49,
#'                                   0.05, 0.15, 0.35, 0.45), ncol = 6),
#'              pred_diets = matrix(c(0.33, 0.34, 0.33,
#'                                    0.10, 0.80, 0.10,
#'                                    0.35, 0.50, 0.15,
#'                                    0.20, 0.35, 0.45,
#'                                    0.20, 0.45, 0.35,
#'                                    0.15, 0.65, 0.20), ncol = 6),
#'              prey_sigs = matrix(c(0.06, 0.09, 0.31, 0.54,
#'                                   0.05, 0.09, 0.30, 0.56,
#'                                   0.03, 0.10, 0.30, 0.57,
#'                                   0.08, 0.07, 0.30, 0.55,
#'                                   0.09, 0.05, 0.33, 0.53,
#'                                   0.09, 0.06, 0.34, 0.51,
#'                                   0.09, 0.07, 0.34, 0.50,
#'                                   0.08, 0.11, 0.35, 0.46,
#'                                   0.06, 0.14, 0.36, 0.44), ncol = 9),
#'              prey_loc = matrix(c(1, 4, 7, 3, 6, 9), ncol=2),
#'              n_pred_boot = 500)
#'
#' find_boot_ss(pred_sigs = matrix(c(0.05, 0.10, 0.30, 0.55,
#'                                   0.04, 0.11, 0.29, 0.56,
#'                                   0.10, 0.05, 0.35, 0.50,
#'                                   0.12, 0.03, 0.37, 0.48,
#'                                   0.10, 0.06, 0.35, 0.49,
#'                                   0.05, 0.15, 0.35, 0.45), ncol = 6),
#'              pred_diets = matrix(c(0.33, 0.34, 0.33,
#'                                    0.10, 0.80, 0.10,
#'                                    0.35, 0.50, 0.15,
#'                                    0.20, 0.35, 0.45,
#'                                    0.20, 0.45, 0.35,
#'                                    0.15, 0.65, 0.20), ncol = 6),
#'              prey_sigs = matrix(c(0.06, 0.09, 0.31, 0.54,
#'                                   0.05, 0.09, 0.30, 0.56,
#'                                   0.03, 0.10, 0.30, 0.57,
#'                                   0.08, 0.07, 0.30, 0.55,
#'                                   0.09, 0.05, 0.33, 0.53,
#'                                   0.09, 0.06, 0.34, 0.51,
#'                                   0.09, 0.07, 0.34, 0.50,
#'                                   0.08, 0.11, 0.35, 0.46,
#'                                   0.06, 0.14, 0.36, 0.44), ncol = 9),
#'              prey_loc = matrix(c(1, 4, 7, 3, 6, 9), ncol=2))
#'
#' @export
#'
################################################################################

find_boot_ss <- function(pred_sigs, pred_diets, prey_sigs, prey_loc,
                         n_pred_boot = 1000){

  # Check inputs ---------------------------------------------------------------

  # Initialize object to be returned.
  var_diet <- NA
  var_sig <- NA
  smooth_pred <- NA
  mod_sig_var <- NA
  var_target <- NA
  boot_ss <- NA


  # Check that the number of predators is equal.
  if(ncol(pred_sigs) != ncol(pred_diets)){
    err_code <- 1
    err_message <- "The number of predators in pred_sigs and pred_diets differ!"

    return(list(var_diet = var_diet,
                var_sig = var_sig,
                var_sig_smooth = smooth_pred,
                mod_sig_var = mod_sig_var,
                var_target = var_target,
                boot_ss = boot_ss,
                err_code = err_code,
                err_message = err_message))
  }


  # Check that the number of prey is equal.
  if(ncol(prey_sigs) != prey_loc[length(prey_loc)]){
    err_code <- 2
    err_message <- "The number of prey in prey_sigs and prey_loc differ!"

    return(list(var_diet = var_diet,
                var_sig = var_sig,
                var_sig_smooth = smooth_pred,
                mod_sig_var = mod_sig_var,
                var_target = var_target,
                boot_ss = boot_ss,
                err_code = err_code,
                err_message = err_message))
  }


  # Check that the number of fatty acids is equal.
  if(nrow(pred_sigs) != nrow(prey_sigs)){
    err_code <- 3
    err_message <- "The number of fatty acids in pred_sigs and prey_sigs differ!"

    return(list(var_diet = var_diet,
                var_sig = var_sig,
                var_sig_smooth = smooth_pred,
                mod_sig_var = mod_sig_var,
                var_target = var_target,
                boot_ss = boot_ss,
                err_code = err_code,
                err_message = err_message))
  }


  # Check that the number of prey types is equal.
  if(nrow(pred_diets) != nrow(prey_loc)){
    err_code <- 4
    err_message <- "The number of prey types in pred_diets and prey_loc differ!"

    return(list(var_diet = var_diet,
                var_sig = var_sig,
                var_sig_smooth = smooth_pred,
                mod_sig_var = mod_sig_var,
                var_target = var_target,
                boot_ss = boot_ss,
                err_code = err_code,
                err_message = err_message))
  }



  # Compute variance between diet estimates and signatures ---------------------

  # Allocate memory.
  n_pred <- ncol(pred_sigs)
  n_pairs <- n_pred*(n_pred - 1)/2

  var_diet <- vector("numeric", n_pairs)
  var_sig <- vector("numeric", n_pairs)

  pair_ind <- 0
  for(li1 in 1:(n_pred - 1)){
    for(li2 in (li1 + 1):n_pred){

      # Increment the counter.
      pair_ind <- pair_ind + 1

      # Compute variances.
      var_diet[pair_ind] <- sum(0.5*(pred_diets[,li1] + pred_diets[,li2])^2 -
                                2*pred_diets[,li1]*pred_diets[,li2])
      var_sig[pair_ind] <- sum(0.5*(pred_sigs[,li1] + pred_sigs[,li2])^2 -
                               2*pred_sigs[,li1]*pred_sigs[,li2])
    }
  }



  # Smooth relationship between variances --------------------------------------

  # Order by the variance in diet.
  pair_order <- order(var_diet)
  var_diet <- var_diet[pair_order]
  var_sig <- var_sig[pair_order]

  # Smooth relationship.
  smooth_mod <- stats::loess(var_sig ~ var_diet)
  smooth_pred <- stats::fitted(smooth_mod)

  var_target <- smooth_pred[1]




  # Prepare for the algorithm --------------------------------------------------

  # Compute the mean diet for use in constructing predator signatures.
  n_prey_types <- nrow(prey_loc)
  n_fa <- nrow(prey_sigs)

  mean_diet <- apply(X = pred_diets, MARGIN = 1, FUN = mean)
  diet_mat <- matrix(data = mean_diet, nrow = n_fa, ncol = n_prey_types,
                     byrow = TRUE)


  # Initialize bootstrap sample sizes.
  boot_ss <- rep(1, n_prey_types)
  names(boot_ss) <- rownames(prey_loc)


  # Compute variances of prey signatures by fatty acid and prey type.
  prey_sig_var <- matrix(data = 0, nrow = n_fa, ncol = n_prey_types)
  for(li1 in 1:n_prey_types){
    for(li2 in 1:n_fa){
      prey_sig_var[li2,li1] <-
                        stats::var(prey_sigs[li2,
                                             prey_loc[li1,1]:prey_loc[li1,2]])
    }
  }

  # Compute fpc for each prey type.
  prey_ss <- prey_loc[,2] - prey_loc[,1] + 1
  fpc <- (prey_ss - boot_ss)/prey_ss
  fpc_mat <- matrix(data = rep(fpc, each=n_fa), nrow = n_fa,
                    ncol = n_prey_types)

  # Initialize weighted variance function.
  prey_var_diet <- prey_sig_var*fpc_mat*diet_mat
  prey_loc_max <- apply(X = prey_var_diet, MARGIN=1, FUN=which.max)


  # Allocate memory for bootstrap predator signatures.
  boot_pred_sig <- matrix(data = 0, nrow = n_fa, ncol = n_pred_boot)


  # Allocate memory for mean prey signatures.
  mean_prey_sig <- matrix(data = 0, nrow = n_fa, ncol = n_prey_types)
  mod_sig_var <- vector("numeric", ncol(pred_sigs))



  # Start the algorithm --------------------------------------------------------
  cell_ind <- 0
  keep_going <- TRUE

  while(keep_going){

    # Increment counter.
    cell_ind <- cell_ind + 1


    # Loop over predators.
    for(li1 in 1:n_pred_boot){

      # Sample prey for this predator and compute mean prey signatures.
      for(li2 in 1:n_prey_types)
        {
        prey_samp <- sample(x = prey_loc[li2,1]:prey_loc[li2,2],
                            size = boot_ss[li2], replace = TRUE)
        mean_prey_sig[,li2] <- apply(X = as.matrix(prey_sigs[,prey_samp]),
                                     MARGIN = 1, FUN = mean)
        }


      # Construct signature for this predator.
      boot_pred_sig[,li1] <- rowSums(diet_mat*mean_prey_sig)
    }


    # Compute the variance of the bootstrapped predator signatures.
    boot_var_fa <- apply(X = boot_pred_sig, MARGIN = 1, FUN = stats::var)
    boot_var_pred <- sum(boot_var_fa)
    mod_sig_var[cell_ind] <- boot_var_pred


    # Check termination criterion.
    if(boot_var_pred > var_target){

      # Find the fatty acid contributing most to the variance of the modeled
      # predator signatures.
      fa_loc <- which.max(boot_var_fa)


      # Find the prey group with the largest variance for this fatty acid.
      prey_to_add <- prey_loc_max[fa_loc]


      # Increment the sample size by 1 and update fpc*variance matrix.
      boot_ss[prey_to_add] <- boot_ss[prey_to_add] + 1

      fpc_mat[,prey_to_add] <- (prey_ss[prey_to_add] -
                                boot_ss[prey_to_add])/
                               prey_ss[prey_to_add]
      prey_var_diet <- prey_sig_var*fpc_mat*diet_mat
      prey_loc_max <- apply(X = prey_var_diet, MARGIN = 1, FUN = which.max)


      # Check that all prey groups are not fully sampled.
      if(sum(boot_ss) >= prey_loc[n_prey_types,2]){
        keep_going <- FALSE
      }
    } else{
      keep_going <- FALSE
    }
  }



  # Clean up and return --------------------------------------------------------

  # Save results from iterations that occurred.
  mod_sig_var <- mod_sig_var[1:cell_ind]
  names(mod_sig_var) <- paste("Iteration", 1:cell_ind, sep = "_")


  # Add prey-type names to bootstrap sample sizes.
  names(boot_ss) <- rownames(pred_diets)


  # Return.
  err_code <- 0
  err_message <- "Success!"

  return(list(var_diet = var_diet,
              var_sig = var_sig,
              var_sig_smooth = smooth_pred,
              mod_sig_var = mod_sig_var,
              var_target = var_target,
              boot_ss = as.integer(boot_ss),
              err_code = err_code,
              err_message = err_message))
}