R/abundEstim.R

Defines functions abundEstim

Documented in abundEstim

#' @title Distance Sampling Abundance Estimates 
#'   
#' @description Estimate abundance (or density) from an estimated detection
#'   function and supplemental information on observed group sizes, transect
#'   lengths, area surveyed, etc.  Computes confidence intervals on
#'   abundance (or density) using a the bias corrected bootstrap method.
#'   
#' @inheritParams dfuncEstim 
#' 
#' @inheritParams predict.dfunc 
#' 
#' @param area A scalar containing the total area of inference. Usually, this is 
#' study area size.  If \code{area} is NULL (the default), 
#' \code{area} will be set to 1 square unit of the output units and density estimates
#' will be produced. 
#' If \code{area} is not NULL, it must have measurement units 
#' assigned by the \code{units} package. 
#' The units on \code{area} must be convertible
#' to squared output units. Units 
#' on \code{area} must be two-dimensional. 
#' For example, if output units are "foo", 
#' units on area must be convertible to "foo^2" by the \code{units}
#' package. Units of "km^2", "cm^2", "ha", "m^2", "acre", "mi^2", and several
#' others are acceptable.  
#'   
#' @param ci A scalar indicating the confidence level of confidence intervals. 
#'   Confidence intervals are computed using a bias corrected bootstrap
#'   method. If \code{ci = NULL} or \code{ci == NA}, confidence intervals 
#'   are not computed.
#'   
#' @param R The number of bootstrap iterations to conduct when \code{ci} is not
#'   NULL.
#'   
#' @param plot.bs A logical scalar indicating whether to plot individual
#'   bootstrap iterations.
#'   Ignored unless \code{parallel = FALSE}.
#'   
#' @param propUnitSurveyed A scalar or vector of real numbers between 0 and 1.  
#' The proportion of the default sampling unit that 
#'   was surveyed.  If both sides of line transects were observed, 
#'   \code{propUnitSurveyed}
#'   = 1.  If only a single side of line transects were observed, set 
#'   \code{propUnitSurveyed} = 0.5. For point transects, this should be set to 
#'   the proportion of each circle that was observed. Length must either be
#'   1 or the total number of transects in \code{x}.
#'   
#' @param showProgress A logical indicating whether to show a text-based
#'   progress bar during bootstrapping. Default is \code{TRUE}. 
#'   It is handy to shut off the 
#'   progress bar if running this within another function. 
#'   Ignored unless \code{parallel = FALSE}.
#'   
#' @param parallel A logical scalar, or a positive integer; ignored unless 
#'   confidence intervals are requested (i.e., \code{!is.null(ci)}).
#'   If TRUE, bootstrap iterations are 
#'   run in parallel using the maximum number of CPU cores minus 1. 
#'   The maximum number of CPU cores is reported by \code{parallel::detectCores()}.
#'   If a positive integer (1 <= \code{parallel} <= maximum cores), bootstrap 
#'   iterations are performed in parallel on that many cores. 
#'   If FALSE, bootstrap iterations are performed in series, and progress 
#'   will be shown if \code{showProgress == TRUE}. Parameters 
#'   \code{showProgress} and \code{plot.bs} are ignored when operating
#'   in parallel. 
#'   
#' @details The abundance estimate for line-transect surveys (if no covariates
#'    are included in the detection function and both sides of the transect 
#'    are observed) is 
#'    \deqn{N =\frac{n(A)}{2(ESW)(L)}}{%
#'          N = n*A / (2*ESW*L)} 
#'    where \emph{n} is total number of sighted individuals 
#'   (i.e., \code{sum(groupSizes(dfunc))}), \emph{L} is the total length of 
#'   surveyed transect (i.e., \code{sum(effort(dfunc))}),
#'   and \emph{ESW} is effective strip width
#'   computed from the estimated distance function (i.e., \code{ESW(dfunc)}).
#'   If only one side of transects were observed, the "2" in the denominator 
#'   is not present (or, replaced with a "1"). 
#'   
#'   The abundance estimate for point transect surveys (if no covariates are
#'   included) is 
#'    \deqn{N =\frac{n(A)}{\pi(ESR^2)(P)}}{%
#'          N = n*A / ((3.1415)*ESR^2*(P))} 
#'    where \emph{n} is total number of sighted individuals (i.e., \code{sum(groupSizes(dfunc))}),
#'    \emph{P} is the total number of surveyed points (i.e., \code{sum(effort(dfunc))}), 
#'    and \emph{ESR} is effective search radius 
#'    computed from the estimated distance function (i.e., \code{ESR(dfunc)}).
#'
#'  This routine, \code{abundEstim}, estimates abundance on the 
#'  entire study area.  Site-specific density estimates 
#'  are computed by 
#'  \code{predict(x, type = "density")}, which returns a 
#'  tibble containing density and abundance on the area surveyed by every
#'  transect. 
#'   
#' @section Bootstrap Confidence Intervals:
#' 
#'   Rdistance's nested data frames (produced by \code{\link{RdistDf}})
#'   contain all information required to estimate bootstrap CIs. 
#'   To compute bootstrap CIs, Rdistance resamples, with replacement,
#'   the rows of the \code{$data} component contained in Rdistance 
#'   fitted models. Rdistance assumes each row of \code{$data} 
#'   contains information on one transect.
#'   The \code{$data} component also contains 
#'   information on which observations inform the 
#'   detection function, which observations should be counted as 
#'   detected targets, 
#'   and which transects count toward transect length. 
#'   After resampling rows of \code{$data}, Rdistance 
#'   refits the distance function using non-missing distances, 
#'   recomputes the detected number of targets using non-missing 
#'   group sizes on transects with non-missing length, 
#'   and re-computes total transect length from transects 
#'   with non-missing lengths. 
#'   By default, \code{R} = 500 bootstrap iterations are 
#'   performed, after which bias
#'   corrected confidence intervals are computed (Manly, 1997, section 3.4).
#'   
#'   The distance function is not re-selected during bootstrap resampling. The 
#'   model of the input object is re-fitted every iteration.  
#'   
#'   During bootstrap iterations, the distance function can fail. 
#'   An iteration can fail for a two reasons:
#'   (1) no detections on the iteration, and (2) a bad configuration 
#'   of distances that push the distance function's parameters to their 
#'   limits. When an iteration fails, Rdistance 
#'   skips the iteration and effectively ignores the 
#'   failed iterations. 
#'   If the proportion of failed iterations is small 
#'   (less than 20\% by default), the resulting abundance confidence interval 
#'   is probably valid and no warning is issued.  If the proportion of 
#'   non-convergent iterations 
#'   is not small (exceeds 20\% by default), a warning is issued.  
#'   The warning can be modified  
#'   by re-setting option \code{"Rdistance_maxBSFailPropForWarning"} to
#'   the acceptable proportion of failures.. 
#'   Setting \code{options(Rdistance_masBSFailPropForWarning = 1.0)} will turn 
#'   suppress the warning. 
#'   Setting \code{options(Rdistance_masBSFailPropForWarning = 0.0)} will 
#'   warn if any iteration failed.  Results (density and effective 
#'   sampling distance) 
#'   from all successful iterations are contained in the  
#'   non-NA rows of data frame 'B' in the output object.  
#'   
#' @section Missing Transect Lengths:
#' 
#'   Transect lengths can be missing in the RdistDf object. 
#'   Missing length transects are equivalent
#'   to 0 [m] transects and do not count toward total surveyed units,
#'   nor do group sizes on these transects count toward total 
#'   detected individuals.  
#'   Use NA-length transects to include their associated distances 
#'   during distance function estimation, but not when estimating abundance. 
#'   For example, this allows estimation of abundance on one 
#'   study area using off-transect distances from another.  This allows 
#'   sightability to be estimated using two or more similar targets (e.g., 
#'   two similar species), but abundance to be estimated separate for each 
#'   target type.
#'   Include NA-length transects by including the "extra" distance observations 
#'   in the detection data frame, with valid
#'   site IDs, but set the length of those site IDs to NA in the site data frame. 
#'
#' @section Point Transect Lengths:
#'   Point transects do not have a physical measurement for length. 
#'   The "length" of point transects is the number of points on the transect. 
#'   Point transects can contain only one point.  
#'   Rdistance treats transects of points as independent 
#'   and bootstrap resamples them to estimate variance. The number of points
#'   on each point transect must exist in the RdistDf and cannot have 
#'   physical measurement units (it is a count, not a distance).
#'   
#'   
#' @return An Rdistance 'abundance estimate' object, which is a list of
#'   class \code{c("abund", "dfunc")}, containing all the components of a "dfunc"
#'   object (see \code{\link{dfuncEstim}}), plus the following: 
#'   
#'   \item{estimates}{A tibble containing fitted coefficients in the
#'   distance function, density in the area(s) surveyed, 
#'   abundance on the study area, the number of groups seen 
#'   between w.lo and w.hi, the number 
#'   of individuals seen between w.lo and w.hi, 
#'   study area size, surveyed area, average group size, and 
#'   average effective detection distance.
#'   }
#'   
#'   \item{B}{If confidence intervals were requested, a tibble 
#'   containing all bootstrap values of coefficients, 
#'   density, abundance, groups seen, individuals seen, 
#'   study area size, surveyed area size, average group size, 
#'   and average effective detection distance.  The number of rows is always 
#'   \code{R}, the requested number of bootstrap 
#'   iterations.  If an iteration fails, the
#'   corresponding row in \code{B} is \code{NA} (hence, use 'na.rm = TRUE' 
#'   when computing summaries). Columns 1 through \code{length(coef(dfunc))}
#'   contain bootstrap realizations of the distance function's coefficients. 
#'   }
#'   
#'   \item{ci}{Confidence level of the confidence intervals}
#'   
#' @references Manly, B.F.J. (1997) \emph{Randomization, bootstrap, and 
#'   Monte-Carlo methods in biology}, London: Chapman and Hall.
#'   
#'   Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers,
#'    and L. Thomas. (2001) \emph{Introduction to distance sampling: estimating
#'    abundance of biological populations}. Oxford University Press, Oxford, UK.
#'   
#' @seealso \code{\link{dfuncEstim}}, \code{\link{autoDistSamp}}, 
#'  \code{\link{predict.dfunc}} with 'type = "density"'.
#' 
#' @examples
#' # Load example sparrow data (line transect survey type)
#' # sparrowDf <- RdistDf(sparrowSiteData, sparrowDetectionData)
#' data(sparrowDf)
#' 
#' # Fit half-normal detection function
#' dfunc <- sparrowDf |> 
#'   dfuncEstim(formula=dist ~ groupsize(groupsize)
#'            , likelihood="halfnorm"
#'            , w.hi=150 %m%.
#'   )
#' 
#' # Estimate abundance - Convenient for programming 
#' abundDf <- estimateN(dfunc
#'                    , area = 4105 %km^2%.
#'            )
#' 
#' # Same - Nicer output 
#' # Set ci=0.95 (or another value) to estimate bootstrap CI's 
#' fit <- abundEstim(dfunc
#'                 , area = 4105 %km^2%.
#'                 , ci = NULL
#'                 )
#'          
#' @export
abundEstim <- function(object
                     , area = NULL
                     , propUnitSurveyed = 1.0
                     , ci = 0.95
                     , R = 500
                     , plot.bs = FALSE
                     , showProgress = TRUE
                     , parallel = TRUE
                     ){


  # It is okay to have NA in effort, distance, and group size columns. See documentation.
  
  bootstrapping <- !is.null(ci) && !is.na(ci)
  
  # Initial setup for parallel session ----
  parallelRequest <- getNCores( parallel )
  parallel <- parallelRequest$parallel  # T or F
  
  if( parallel ){
    plot.bs <- FALSE
    showProgress <- FALSE
  }
  
  # Initial setup for plotting ----
  if (bootstrapping && plot.bs) {
    graphics::par( xpd=TRUE )
    plotObj <- plot(object)
  } else {
    plotObj <- NULL
  }
  
  # ---- Set Area ----
  if( is.null(area) ){
    # doing this here saves a tiny sliver of time in estimateN
    area <- setUnits(1, object$outputUnits)^2
  }
  
  # ---- Construct Original estimates frame ----
  ests <- estimateN(object = object
                  , area = area
                  , propUnitSurveyed = propUnitSurveyed
                    )
  ests$id <- "Original"
  
  # ---- Prelims: constants for bootstrapping ----

  # pb <- list(tick = function(){}) # NULL tick function for not bootstrapping

  # ---- Add bootstrap indices if called for ----
  # This is essentially what rsample::bootstraps does, but rsample stores the 
  # entire data frame in each 'rsample'.  This saves space. But, entire 
  # (R*nrow(x$data)+1) X 2 data frame must be constructable in memory.
  if ( bootstrapping ) {
    
    B <- bootstrap(
                    object = object
                  , area = area
                  , propUnitSurveyed = propUnitSurveyed
                  , R = R
                  , plot.bs = plot.bs
                  , plotCovValues = plotObj$predCovValues
                  , showProgress = showProgress
                  , parallel = parallel
                  , cores = parallelRequest$cores
                ) 
    
    # Replace varcovar with bootstrap varcovar
    bsCoefs <- B |> 
      dplyr::ungroup() |> 
      dplyr::select(dplyr::all_of(names(stats::coef(object))))
    object$varcovar <- stats::var(bsCoefs)
    object$asymptoticSE <- FALSE
    

  } else {
    ci <- NA
    B <- NULL
  }
  
  # ---- Construct output object ----
  ans <- c(object
          , estimates = list(ests)
          , B = list(B)
          )
  
  # ---- Plot original fit again (over bs lines) ----
  if (bootstrapping && plot.bs) {
    graphics::lines(object
          , newdata = plotObj$predCovValues
          , col = "red"
          , lwd = 3)
  } 

  # ---- Compute confidence intervals ----      
  if( bootstrapping ){
    vec2df <- function(x, nm){
      # only for vectors of length 2, not general but fast
      xx <- data.frame(lo = x[1], hi = x[2])
      names(xx) <- paste0(nm, "_", names(xx))
      xx
    }
    abCI <- Rdistance::bcCI(B$abundance, ests$abundance, ci)
    dnCI <- Rdistance::bcCI(B$density, ests$density, ci)
    efCI <- Rdistance::bcCI(B$avgEffDistance, ests$avgEffDistance, ci) 
    abCI <- vec2df(abCI, "abundance") 
    dnCI <- vec2df(dnCI, "density") 
    efCI <- vec2df(efCI, "avgEffDistance") 
    
    ans$estimates <- ans$estimates |> 
      dplyr::bind_cols( abCI ) |> 
      dplyr::bind_cols( dnCI ) |> 
      dplyr::bind_cols( efCI ) 

    # rearrange columns    
    ans$estimates <- ans$estimates |> 
      dplyr::select("id"
                  , dplyr::all_of(names(stats::coef(object)))  
                  , dplyr::starts_with("density")
                  , dplyr::starts_with("abundance")
                  , dplyr::starts_with("avgEffDistance")
                  , dplyr::everything())
    B <- B |> 
      dplyr::select("id"
                  , dplyr::all_of(names(stats::coef(object)))  
                  , dplyr::starts_with("density")
                  , dplyr::starts_with("abundance")
                  , dplyr::starts_with("avgEffDistance")
                  , dplyr::everything())
    
    if ((object$LhoodType == "parametric") && 
        (any(is.na(B$density))) && 
        showProgress){
      cat(paste( sum(is.na(B$density)), "of", R
                 , "iterations did not converge.\n"))
    }
  }

  # Output
  ans$ci <- ci
  class(ans) <- c("abund", class(object))
    
  return(ans)
} 

Try the Rdistance package in your browser

Any scripts or data that you put into this service are public.

Rdistance documentation built on Jan. 10, 2026, 1:07 a.m.