#' @title abundEstim - Distance Sampling Abundance Estimates
#'
#' @description Estimate abundance (or density) from an estimated detection
#' function and supplemental information on observed group sizes, transect
#' lengths, area surveyed, etc. Computes confidence intervals on
#' abundance (or density) using a the bias corrected bootstrap method.
#'
#' @inheritParams dfuncEstim
#'
#' @inheritParams predict.dfunc
#'
#' @param area A scalar containing the total area of inference. Usually, this is
#' study area size. If \code{area} is NULL (the default),
#' \code{area} will be set to 1 square unit of the output units and density estimates
#' will be produced.
#' If \code{area} is not NULL, it must have measurement units
#' assigned by the \code{units} package.
#' The units on \code{area} must be convertible
#' to squared output units. Units
#' on \code{area} must be two-dimensional.
#' For example, if output units are "foo",
#' units on area must be convertible to "foo^2" by the \code{units}
#' package. Units of "km^2", "cm^2", "ha", "m^2", "acre", "mi^2", and several
#' others are acceptable.
#'
#' @param ci A scalar indicating the confidence level of confidence intervals.
#' Confidence intervals are computed using a bias corrected bootstrap
#' method. If \code{ci = NULL} or \code{ci == NA}, confidence intervals
#' are not computed.
#'
#' @param R The number of bootstrap iterations to conduct when \code{ci} is not
#' NULL.
#'
#' @param plot.bs A logical scalar indicating whether to plot individual
#' bootstrap iterations.
#'
#' @param propUnitSurveyed A scalar or vector of real numbers between 0 and 1.
#' The proportion of the default sampling unit that
#' was surveyed. If both sides of line transects were observed,
#' \code{propUnitSurveyed}
#' = 1. If only a single side of line transects were observed, set
#' \code{propUnitSurveyed} = 0.5. For point transects, this should be set to
#' the proportion of each circle that was observed. Length must either be
#' 1 or the total number of transects in \code{x}.
#'
#' @param showProgress A logical indicating whether to show a text-based
#' progress bar during bootstrapping. Default is \code{TRUE}.
#' It is handy to shut off the
#' progress bar if running this within another function. Otherwise,
#' it is handy to see progress of the bootstrap iterations.
#'
#'
#' @details The abundance estimate for line-transect surveys (if no covariates
#' are included in the detection function and both sides of the transect
#' are observed) is
#' \deqn{N =\frac{n(A)}{2(ESW)(L)}}{%
#' N = n*A / (2*ESW*L)}
#' where \emph{n} is total number of sighted individuals
#' (i.e., \code{sum(groupSizes(dfunc))}), \emph{L} is the total length of
#' surveyed transect (i.e., \code{sum(effort(dfunc))}),
#' and \emph{ESW} is effective strip width
#' computed from the estimated distance function (i.e., \code{ESW(dfunc)}).
#' If only one side of transects were observed, the "2" in the denominator
#' is not present (or, replaced with a "1").
#'
#' The abundance estimate for point transect surveys (if no covariates are
#' included) is
#' \deqn{N =\frac{n(A)}{\pi(ESR^2)(P)}}{%
#' N = n*A / ((3.1415)*ESR^2*(P))}
#' where \emph{n} is total number of sighted individuals (i.e., \code{sum(groupSizes(dfunc))}),
#' \emph{P} is the total number of surveyed points (i.e., \code{sum(effort(dfunc))}),
#' and \emph{ESR} is effective search radius
#' computed from the estimated distance function (i.e., \code{ESR(dfunc)}).
#'
#' Setting \code{plot.bs=FALSE} and \code{showProgress=FALSE}
#' suppresses all intermediate output.
#'
#' Estimation of site-specific density (e.g., on every transect) is accomplished by
#' \code{predict(x, type = "density")}, which returns a
#' tibble containing density and abundance on the area surveyed by every
#' transect.
#'
#' @section Bootstrap Confidence Intervals:
#'
#' Rdistance's nested data frames (produced by \code{\link{RdistDf}})
#' contain all information required to estimate bootstrap CIs.
#' To compute bootstrap CIs, Rdistance resamples, with replacement,
#' the rows of the \code{$data} component contained in Rdistance
#' fitted models. Rdistance assumes each row of \code{$data}
#' contains one information on on transect.
#' The \code{$data} component also contains
#' information on which observations go into the
#' detection functions, which should be counted as detected targets,
#' and which count toward transect length.
#' After resampling rows of \code{$data}, Rdistance
#' refits the distance function using non-missing distances,
#' recomputes the detected number of targets using non-missing
#' group sizes on transects with non-missing length,
#' and re-computes total transect length from transects
#' with non-missing lengths.
#' By default, \code{R} = 500 bootstrap iterations are
#' performed, after which bias
#' corrected confidence intervals are computed (Manly, 1997, section 3.4).
#'
#' The distance function is not re-selected during bootstrap resampling. The
#' model of the input object is re-fitted every iteration.
#'
#' During bootstrap iterations, the distance function can fail.
#' An iteration can fail for a two reasons:
#' (1) no detections on the iteration, and (2) a bad configuration
#' of distances that push the distance function's parameters to their
#' limits. When an iteration fails, Rdistance
#' skips the iteration and effectively ignores the
#' failed iterations.
#' If the proportion of failed iterations is small
#' (less than 20% by default), the resulting abundance confidence interval
#' is probably valid and no warning is issued. If the proportion of
#' non-convergent iterations
#' is not small (exceeds 20% by default), a warning is issued.
#' The warning can be modified
#' by re-setting the \code{Rdistance_maxBSFailPropForWarning} option.
#' Setting \code{options(Rdistance_masBSFailPropForWarning = 1.0)} will turn
#' off the warning.
#' Setting \code{options(Rdistance_masBSFailPropForWarning = 0.0)} will
#' warn if any iteration failed. Results (density and effective
#' sampling distance)
#' from all successful iterations are contained in the
#' non-NA rows of data frame 'B' in the output object.
#'
#' @section Missing Transect Lengths:
#'
#' Transect lengths can be missing in the RdistDf object.
#' Missing length transects are equivalent
#' to 0 [m] transects and do not count toward total surveyed units
#' nor to group sizes on these transects count toward total
#' detected individuals.
#' Use NA-length transects to include their associated distances
#' when estimating the distance function, but not when estimating abundance.
#' For example, this allows estimation of abundance on one
#' study area using off-transect distances from another. This allows
#' sightability to be estimated using two or more similar targets (e.g.,
#' two similar species), but abundance to be estimated separate for each
#' target type.
#' Include NA-length transects by including the "extra" distance observations
#' in the detection data frame, with valid
#' site IDs, but set the length of those site IDs to NA in the site data frame.
#'
#' @section Point Transect Lengths:
#' Point transects do not have a physical measurement for length.
#' The "length" of point transects is the number of points on the transect.
#' Point transects can contain only one point.
#' Rdistance treats transects of points as independent
#' and bootstrap resamples them to estimate variance. The number of points
#' on each point transect must exist in the RdistDf and cannot have
#' physical measurement units (it is a count, not a distance).
#'
#'
#' @return An Rdistance 'abundance estimate' object, which is a list of
#' class \code{c("abund", "dfunc")}, containing all the components of a "dfunc"
#' object (see \code{\link{dfuncEstim}}), plus the following:
#'
#' \item{estimates}{A tibble containing fitted coefficients in the
#' distance function, density in the area(s) surveyed,
#' abundance on the study area, the number of groups seen
#' between w.lo and w.hi, the number
#' of individuals seen between w.lo and w.hi,
#' study area size, surveyed area, average group size, and
#' average effective detection distance.
#' }
#'
#' \item{B}{If confidence intervals were requested, a tibble
#' containing all bootstrap values of coefficients,
#' density, abundance, groups seen, individuals seen,
#' study area size, surveyed area size, average group size,
#' and average effective detection distance. The number of rows is always
#' \code{R}, the requested number of bootstrap
#' iterations. If an iteration fails, the
#' corresponding row in \code{B} is \code{NA} (hence, use 'na.rm = TRUE'
#' when computing summaries). Columns 1 through \code{length(coef(dfunc))}
#' contain bootstrap realizations of the distance function's coefficients.
#' }
#'
#' \item{ci}{Confidence level of the confidence intervals}
#'
#' @references Manly, B.F.J. (1997) \emph{Randomization, bootstrap, and
#' Monte-Carlo methods in biology}, London: Chapman and Hall.
#'
#' Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers,
#' and L. Thomas. (2001) \emph{Introduction to distance sampling: estimating
#' abundance of biological populations}. Oxford University Press, Oxford, UK.
#'
#' @seealso \code{\link{dfuncEstim}}, \code{\link{autoDistSamp}},
#' \code{\link{predict.dfunc}} with 'type = "density"'.
#'
#' @examples
#' # Load example sparrow data (line transect survey type)
#' # sparrowDf <- RdistDf(sparrowSiteData, sparrowDetectionData)
#' data(sparrowDf)
#'
#' # Fit half-normal detection function
#' dfunc <- sparrowDf |>
#' dfuncEstim(formula=dist ~ groupsize(groupsize)
#' , likelihood="halfnorm"
#' , w.hi=units::set_units(150, "m")
#' )
#'
#' # Estimate abundance - Convenient for programming
#' abundDf <- estimateN(dfunc
#' , area = units::set_units(4105, "km^2")
#' )
#'
#' # Same - Nicer output
#' # Set ci=0.95 (or another value) to estimate bootstrap CI's on ESW, density, and abundance.
#' fit <- abundEstim(dfunc
#' , area = units::set_units(4105, "km^2")
#' , ci = NULL
#' )
#'
#' @export
#' @importFrom graphics lines par
abundEstim <- function(object
, area = NULL
, propUnitSurveyed = 1.0
, ci = 0.95
, R = 500
, plot.bs = FALSE
, showProgress = TRUE
){
# It is okay to have NA in effort, distance, and group size columns. See documentation.
bootstrapping <- !is.null(ci) && !is.na(ci)
# Initial setup for plotting ----
if (bootstrapping && plot.bs) {
graphics::par( xpd=TRUE )
plotObj <- plot(object)
}
# ---- Set Area ----
if( is.null(area) ){
# doing this here saves a tiny sliver of time in estimateN
area <- units::set_units(1, object$outputUnits, mode = "standard")^2
}
# ---- Construct Indices to Original data frame ----
nDataRows <- nrow(object$data)
bsData <- data.frame(id = "Original",
rowIndex = 1:nDataRows)
pb <- list(tick = function(){}) # NULL tick function for not bootstrapping
# ---- Add bootstrap indices if called for ----
# This is essentially what rsample::bootstraps does, but rsample stores the
# entire data frame in each 'rsample'. This saves space. But, entire
# (R*nrow(x$data)+1) X 2 data frame must be constructable in memory.
if ( bootstrapping ) {
nDigits <- ceiling(log10(R + 0.1))
id <- rep(1:R, each = nDataRows)
repsDf <- data.frame(
id = paste0("Bootstrap_",
formatC(id
, format = "f"
, digits = 0
, width = nDigits
, flag = "0"))
, rowIndex = sample( 1:nDataRows
, size = R*nDataRows
, replace = TRUE
))
bsData <- bsData |>
dplyr::bind_rows( repsDf )
# set up progress bar if called for, only if bootstrapping
if(showProgress){
pb <- progress::progress_bar$new(
format = "Bootstrapping: [:bar] Elapsed Time: :elapsedfull "
, total = R+1
, show_after = 1
, clear = FALSE
)
}
} else {
ci <- NA
}
# --- Apply estimation to each ID group ----
bsEsts <- bsData |>
dplyr::group_by(id) |>
dplyr::group_modify(.f = oneBsIter # In Rdistance, not exported
, data = object$data
, formula = object$formula
, likelihood = object$likelihood
, w.lo = object$w.lo
, w.hi = object$w.hi
, expansions = object$expansions
, series = object$series
, x.scl = object$x.scl
, g.x.scl = object$g.x.scl
, outputUnits = object$outputUnits
, warn = FALSE
, area = area
, propUnitSurveyed = propUnitSurveyed
, pb = pb
, plot.bs = plot.bs
, plotCovValues = plotObj$predCovValues
)
# ---- Construct output object ----
ests <- bsEsts |>
dplyr::filter(id == "Original")
if( bootstrapping ){
B <- bsEsts |>
dplyr::filter( id != "Original" )
} else {
B <- NULL
}
ans <- c(object
, estimates = list(ests)
, B = list(B)
)
# ---- Plot original fit again (over bs lines) ----
if (bootstrapping && plot.bs) {
graphics::lines(object
, newdata = plotObj$predCovValues
, col = "red"
, lwd = 3)
}
# ---- Compute confidence intervals ----
if( bootstrapping ){
vec2df <- function(x, nm){
# only for vectors of length 2, not general but fast
xx <- data.frame(lo = x[1], hi = x[2])
names(xx) <- paste0(nm, "_", names(xx))
xx
}
abCI <- Rdistance::bcCI(bsEsts$abundance, ests$abundance, ci)
dnCI <- Rdistance::bcCI(bsEsts$density, ests$density, ci)
efCI <- Rdistance::bcCI(bsEsts$avgEffDistance, ests$avgEffDistance, ci)
abCI <- vec2df(abCI, "abundance")
dnCI <- vec2df(dnCI, "density")
efCI <- vec2df(efCI, "avgEffDistance")
ans$estimates <- ans$estimates |>
dplyr::bind_cols( abCI ) |>
dplyr::bind_cols( dnCI ) |>
dplyr::bind_cols( efCI )
# rearrange columns
ans$estimates <- ans$estimates |>
dplyr::select(id, dplyr::starts_with("density"), dplyr::starts_with("abundance"), dplyr::starts_with("avgEffDistance"), dplyr::everything())
B <- B |>
dplyr::select(id, dplyr::starts_with("density"), dplyr::starts_with("abundance"), dplyr::starts_with("avgEffDistance"), dplyr::everything())
if ((object$LhoodType == "parametric") && (any(is.na(bsEsts$density))) && showProgress){
cat(paste( sum(is.na(bsEsts$density)), "of", R, "iterations did not converge.\n"))
}
}
# Output
ans$ci <- ci
class(ans) <- c("abund", class(object))
return(ans)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.