#' @title Estimate a non-parametric smooth detection function
#' from distance-sampling data
#'
#' @description Estimates a smooth detection function for
#' line-transect perpendicular distances or point-transect
#' radial distances.
#'
#' @param formula A formula object (e.g., dist ~ 1).
#' The left-hand side (before ~)
#' is the name of the vector containing distances (perpendicular or
#' radial). The right-hand side (after ~)
#' must be the intercept-only model as \code{Rdistance} does not
#' currently allow covariates in smoothed distance functions.
#' If names in \code{formula} do not appear in \code{detectionData},
#' the normal scoping rules for model fitting routines (e.g.,
#' \code{lm} and \code{glm}) apply.
#'
#' @param detectionData A data frame containing detection distances
#' (either perpendicular for line-transect or radial for point-transect
#' designs), with one row per detected object or group.
#' This data frame must contain at least the following
#' information:
#' \itemize{
#' \item Detection Distances: A single column containing
#' detection distances must be specified on the left-hand
#' side of \code{formula}.
#' \item Site IDs: The ID of the transect or point
#' (i.e., the 'site') where each object or group was detected.
#' The site ID column(s) (see argument \code{siteID}) must
#' specify the site (transect or point) so that this
#' data frame can be merged with \code{siteData}.
#' }
#' Optionally, this data frame can contain the following
#' variables:
#' \itemize{
#' \item Group Sizes: The number of individuals in the group
#' associated with each detection. If unspecified, \code{Rdistance}
#' assumes all detections are of single individuals (i.e.,
#' all group sizes are 1).
#'
#' \item When \code{Rdistance} allows detection-level
#' covariates in some version after 2.1.1, detection-level
#' covariates will appear in this data frame.
#'
#' }
#' See example data set \code{\link{sparrowDetectionData}}).
#' See also \bold{Input data frames} below
#' for information on when \code{detectionData} and
#' \code{siteData} are required inputs.
#'
#' @inheritParams dfuncEstim
#'
#' @param siteData A data.frame containing site (transect or point)
#' IDs and any
#' \emph{site level} covariates to include in the detection function.
#' Every unique surveyed site (transect or point) is represented on
#' one row of this data set, whether or not targets were sighted
#' at the site. See arguments \code{transectID} and
#' \code{pointID} for an explanation of site and transect ID's.
#'
#' If sites are transects,
#' this data frame must also contain transect length. By
#' default, transect length is assumed to be in column 'length'
#' but can be specified using argument \code{length}.
#'
#' The total number of sites surveyed is \code{nrow(siteData)}.
#' Duplicate site-level IDs are not allowed in \code{siteData}.
#'
#' See \bold{Input data frames}
#' for when \code{detectionData} and \code{siteData} are required inputs.
#'
#'
#' @param bw Bandwidth of the smooth, which controls
#' smoothness. Smoothing is done by \code{stats::density}, and
#' \code{bw} is
#' passed straight to it's \code{bw} argument. \code{bw} can be
#' numeric, in which case it is the standard deviation of the
#' Gaussian smoothing kernel. Or, \code{bw} can be
#' a character string specifying the
#' bandwidth selection rule. Valid character string values
#' of \code{bw} are the following:
#' \itemize{
#' \item "nrd0" : Silverman's 'rule-of-thumb' equal to
#' \eqn{\frac{0.9s}{1.34n^{-0.2}}}{0.9(s)/(1.34n^(-1/5))}, where
#' \eqn{s} is the minimum of standard deviation of the distances
#' and the interquartile range. See \code{\link[stats]{bw.nrd0}}.
#' \item "nrd" : The more common 'rule-of-thumb' variation given by
#' Scott (1992). This rule uses 1.06 in the denominator of the
#' "nrd0" bandwidth. See \code{\link[stats]{bw.nrd}}
#' \item "bcv" : The biased cross-validation method. See \code{\link[MASS]{bcv}}.
#' \item "ucv" : The unbiased cross-validation method. See \code{\link[MASS]{ucv}}.
#' \item "SJ" or "SJ-ste" : The 'solve-the-equation' bandwidth of Sheather &
#' Jones (1991). See \code{\link[stats]{bw.SJ}} or \code{\link[MASS]{width.SJ}}.
#' \item "SJ-dpi" (default) : The 'direct-plug-in' bandwidth of Sheather &
#' Jones (1991). See \code{\link[stats]{bw.SJ}} or \code{\link[MASS]{width.SJ}}.
#' }
#'
#' @param adjust Bandwidth adjustment for the amount of smooth.
#' Smoothing is done by \code{\link[stats]{density}}, and
#' this parameter is
#' passed straight to it's \code{adjust} argument.
#' In \code{stats::density}, the bandwidth used is
#' actually \code{adjust*bw}, and inclusion of this parameters makes
#' it easier to specify values like 'half the default' bandwidth.
#'
#' @param kernel Character string specifying the smoothing kernel function.
#' This parameters is passed unmodified to \code{stats::density}. Valid
#' values are:
#' \itemize{
#' \item "gaussian" : Gaussian (normal) kernel, the default
#' \item "rectangular" : Uniform or flat kernel
#' \item "triangular" : Equilateral triangular kernel
#' \item "epanechnikov" : the Epanechnikov kernel
#' \item "biweight" : the biweight kernel
#' \item "cosine" : the S version of the cosine kernel
#' \item "optcosine" : the optimal cosine kernel which is the usual
#' one reported in the literature
#' }
#' Values of \code{kernel} may be abbreviated to the first letter of
#' each string. The numeric value of \code{bw} used in the smooth
#' is stored in the \code{$fit} component of the returned object
#' (i.e., in \code{returned$fit$bw}).
#'
#' @param pointSurvey A logical scalar specifying whether input data come
#' from point-transect surveys (TRUE),
#' or line-transect surveys (FALSE). Point surveys (TRUE) have not been
#' implemented yet.
#'
#' @param w.lo Lower or left-truncation limit of the distances in distance data.
#' This is the minimum possible off-transect distance. Default is 0.
#'
#' @param w.hi Upper or right-truncation limit of the distances
#' in \code{dist}. This is the maximum off-transect distance that
#' could be observed. If left unspecified (i.e., at the default of
#' NULL), right-truncation is set to the maximum of the
#' observed distances.
#'
#'
#' @param x.scl This parameter is passed to \code{F.gx.estim}.
#' See \code{F.gx.estim} documentation for definition.
#'
#' @param g.x.scl This parameter is passed to \code{F.gx.estim}.
#' See \code{F.gx.estim} documentation for definition.
#'
#' @param observer This parameter is passed to \code{F.gx.estim}.
#' See \code{F.gx.estim} documentation for definition.
#'
#' @param warn A logical scalar specifying whether to issue
#' an R warning if the estimation did not converge or if one
#' or more parameter estimates are at their boundaries.
#' For estimation, \code{warn} should generally be left at
#' its default value of \code{TRUE}. When computing bootstrap
#' confidence intervals, setting \code{warn = FALSE}
#' turns off annoying warnings when an iteration does
#' not converge. Regardless of \code{warn}, messages about
#' convergence and boundary conditions are printed
#' by \code{print.dfunc}, \code{print.abund}, and
#' \code{plot.dfunc}, so there should be little harm in
#' setting \code{warn = FALSE}.
#'
#' @param transectID A character vector naming the transect ID column(s) in
#' \code{detectionData} and \code{siteData}. Transects can be the
#' basic sampling unit (when \code{pointSurvey}=FALSE) or
#' contain multiple sampling units (e.g., when \code{pointSurvey}=TRUE).
#' For line-transects, the \code{transectID} column(s) alone is
#' sufficient to specify unique sample sites.
#' For point-transects, the amalgamation of \code{transectID} and
#' \code{pointID} specify unique sampling sites.
#' See \bold{Input data frames}.
#'
#' @param pointID When point-transects are used, this is the
#' ID of points on a transect. When \code{pointSurvey}=TRUE,
#' the amalgamation of \code{transectID} and
#' \code{pointID} specify unique sampling sites.
#' See \bold{Input data frames}.
#'
#' If single points are surveyed,
#' meaning surveyed points were not grouped into transects, each 'transect' consists
#' of one point. In this case, set \code{transectID} equal to
#' the point's ID and set \code{pointID} equal to 1 for all points.
#'
#' @param length Character string specifying the (single) column in
#' \code{siteData} that contains transect length. This is ignored if
#' \code{pointSurvey} = TRUE.
#'
#' @section Input data frames:
#' To save space and to easily specify
#' sites without detections,
#' all site ID's, regardless whether a detection occurred there,
#' and \emph{site level} covariates are stored in
#' the \code{siteData} data frame. Detection distances and group
#' sizes are measured at the \emph{detection level} and
#' are stored in the
#' \code{detectionData} data frame.
#'
#' \subsection{Data frame requirements}{The following explains
#' conditions under which various combinations of the input data frames
#' are required.
#'
#' \enumerate{
#' \item \bold{Detection data and site data both required:}\cr
#' Both \code{detectionData} and \code{siteData}
#' are required if \emph{site level} covariates are
#' specified on the right-hand side of \code{formula}.
#' \emph{Detection level} covariates are not currently allowed.
#'
#' \item \bold{Detection data only required:}\cr
#' The \code{detectionData} data frame alone can be
#' specified if no covariates
#' are included in the distance function (i.e., right-hand side of
#' \code{formula} is "~1"). Note that this routine (\code{dfuncEstim})
#' does not need to know about sites where zero targets were detected, hence
#' \code{siteData} can be missing when no covariates are involved.
#'
#' \item \bold{Neither detection data nor site data required}\cr
#' Neither \code{detectionData} nor \code{siteData}
#' are required if all variables specified in \code{formula}
#' are within the scope of this routine (e.g., in the global working
#' environment). Scoping rules here work the same as for other modeling
#' routines in R such as \code{lm} and \code{glm}. Like other modeling
#' routines, it is possible to mix and match the location of variables in
#' the model. Some variables can be in the \code{.GlobalEnv} while others
#' are in either \code{detectionData} or \code{siteData}.
#' }
#'
#' }
#'
#' \subsection{Relationship between data frames (transect and point ID's)}{
#' The input data frames, \code{detectionData} and \code{siteData},
#' must be merge-able on unique sites. For line-transects,
#' site ID's (i.e., transect ID's) are unique values of
#' the \code{transectID} column in \code{siteData}. In this case,
#' the following merge must work:
#' \code{merge(detectionData,siteData,by=transectID)}.
#' For point-transects,
#' site ID's (i.e., point ID's) are unique values
#' of the combination \code{paste(transectID,pointID)}.
#' In this case, the following merge must work:
#' \code{merge(detectionData,siteData,by=c(transectID, pointID)}.
#'
#' By default,\code{transectID} and \code{pointID} are NULL and
#' the merge is done on all common columns.
#' That is, when \code{transectID} is NULL, this routine assumes unique
#' \emph{transects} are specified by unique combinations of the
#' common variables (i.e., unique values of
#' \code{intersect(names(detectionData), names(siteData))}).
#'
#' An error occurs if there are no common column names between
#' \code{detectionData} and \code{siteData}.
#' Duplicate site IDs are not allowed in \code{siteData}.
#' If the same site is surveyed in
#' multiple years, specify another transect ID column (e.g., \code{transectID =
#' c("year","transectID")}). Duplicate site ID's are allowed in
#' \code{detectionData}.
#'
#' To help explain the relationship between data frames, bear in
#' mind that during bootstrap estimation of variance
#' in \code{\link{abundEstim}},
#' unique \emph{transects} (i.e., unique values of
#' the transect ID column(s)), not \emph{detections} or
#' \emph{points}, are resampled with replacement.
#' }
#'
#' @details Distances are reflected about \code{w.lo} before being passed
#' to \code{density}. Distances exactly equal to \code{w.lo} are not
#' reflected. Reflection around \code{w.lo} greatly improves
#' performance of the kernel methods near the \code{w.lo} boundary
#' where substantial non-zero probability of sighting typically exists.
#'
#'
#' @return An object of class 'dfunc'. Objects of class 'dfunc'
#' are lists containing the following components:
#' \item{parameters}{A data frame containing the $x and $y
#' components of the smooth. $x is a vector of length
#' 512 (default for \code{density}) evenly spaced points
#' between \code{w.lo} and \code{w.hi}.}
#' \item{loglik}{The value of the log likelihood. Specifically,
#' the sum of the negative log heights of the smooth at observed
#' distances, after the smoothed function has been scaled to integrate
#' to one. }
#' \item{w.lo}{Left-truncation value used during the fit.}
#' \item{w.hi}{Right-truncation value used during the fit.}
#' \item{dist}{The input vector of observed distances.}
#' \item{covars}{NULL. Covariates are not allowed in the
#' smoothed distance function (yet). }
#' \item{call}{The original call of this function.}
#' \item{call.x.scl}{The distance at which the distance function
#' is scaled. This is the x at which g(x) = \code{g.x.scl}.
#' Normally, \code{call.x.scl} = 0.}
#' \item{call.g.x.scl}{The value of the distance function at distance
#' \code{call.x.scl}. Normally, \code{call.g.x.scl} = 1.}
#' \item{call.observer}{The value of input parameter \code{observer}.}
#' \item{fit}{The smoothed object returned by \code{stats::density}. All
#' information returned by \code{stats::density} is preserved, and
#' in particular the numeric value of the bandwidth used during the
#' smooth is returned in \code{fit$bw}}
#' \item{pointSurvey}{The input value of \code{pointSurvey}.
#' This is TRUE if distances are radial from a point. FALSE
#' if distances are perpendicular off-transect. }
#' \item{formula}{The formula specified for the detection function.}
#'
#' @references
#'
#' Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers,
#' and L. Thomas. (2001) \emph{Introduction to distance sampling: estimating
#' abundance of biological populations}. Oxford University Press, Oxford, UK.
#'
#' Scott, D. W. (1992) \emph{Multivariate Density Estimation: Theory,
#' Practice, and Visualization.} Wiley.
#'
#' Sheather, S. J. and Jones, M. C. (1991) A reliable data-based
#' bandwidth selection method for kernel density estimation. \emph{Journal of
#' the Royal Statistical Society series B}, 53, 683-690.
#'
#' Silverman, B. W. (1986) \emph{Density Estimation}. London: Chapman and Hall.
#'
#' @seealso \code{\link{abundEstim}}, \code{\link{autoDistSamp}},
#' \code{\link{dfuncEstim}} for the parametric version.
#'
#' @examples
#' # Load example sparrow data (line transect survey type)
#' data(sparrowDetectionData)
#' data(sparrowSiteData)
#'
#' # Compare smoothed and half-normal detection function
#' dfuncSmu <- dfuncSmu(dist~1, sparrowDetectionData, w.hi=units::set_units(150, "m"))
#' dfuncHn <- dfuncEstim(formula=dist~1,sparrowDetectionData,w.hi=units::set_units(150, "m"))
#'
#' # Print and plot results
#' dfuncSmu
#' dfuncHn
#' plot(dfuncSmu,main="",nbins=50)
#'
#' x <- seq(0,150,length=200)
#' y <- dnorm(x, 0, predict(dfuncHn)[1])
#' y <- y/y[1]
#' lines(x,y, col="orange", lwd=2)
#' legend("topright", legend=c("Smooth","Halfnorm"),
#' col=c("red","orange"), lwd=2)
#'
#' @keywords model
#' @export
#' @importFrom stats model.response is.empty.model model.matrix contrasts
dfuncSmu <- function (formula
, detectionData
, siteData
, bw = "SJ-dpi"
, adjust = 1
, kernel = "gaussian"
, pointSurvey = FALSE
, w.lo = units::set_units(0,"m")
, w.hi = NULL
, x.scl = "max"
, g.x.scl = 1
, observer = "both"
, warn = TRUE
, transectID = NULL
, pointID = "point"
, outputUnits = NULL
, length = "length"
, control = RdistanceControls()
){
cl <- match.call()
if(!missing(detectionData) & !missing(siteData)){
if( is.null(transectID) ){
transectID <- intersect( names(detectionData), names(siteData) )
if( pointSurvey ){
pointID <- NULL # multiple pts per transect not allowed when no transectID specified
}
}
if( pointSurvey ){
siteID.cols <- c(transectID, pointID)
} else {
siteID.cols <- c(transectID)
}
data <- merge(detectionData, siteData, by=siteID.cols)
} else if(!missing(detectionData)){
data <- detectionData
} else{
data <- NULL
}
# (jdc) The double-observer method hasn't been checked since updating to version 2.x
# It's likely that an error would be raised if a user did try the double-observer option,
# but I'm adding a warning here, just in case
if(observer != "both") {
stop("The double-observer routines were not been tested when updating to
version 2.x, so they have been disables for the time being.")
}
mf <- getDfuncModelFrame(formula, data)
mt <- attr(mf, "terms")
dist <- model.response(mf,"any")
groupSize <- stats::model.offset(mf)
if( is.null(groupSize) ){
# no groupsize specified, assume 1
groupSize <- rep(1, length(dist) )
}
# Check for measurement units
if( !inherits(dist, "units") & control$requireUnits ){
dfName <- deparse(substitute(detectionData))
respName <- as.character(attr(mt, "variables"))[attr(mt, "response") + 1]
stop(paste("Distance measurement units are required.",
"Assign units by attaching 'units' package then:\n",
paste0("units(",dfName,"$", respName, ")"), "<- '<units of measurment>',\n",
"for example 'm' (meters) or 'ft' (feet). See units::valid_udunits()"))
} else if( control$requireUnits ){
# if we are here, dist has units
# set units for output by converting dist units; w.lo, w.hi, and x.scl will all be converted later
if( !is.null(outputUnits) ){
dist <- units::set_units(dist, outputUnits, mode = "standard")
}
outUnits <- units(dist)
}
if( !inherits(w.lo, "units") & control$requireUnits ){
if( w.lo[1] != 0 ){
stop(paste("Units of minimum distance are required.",
"Assign units by attaching 'units' package then:\n",
"units(w.lo) <- '<units>' or",
paste0("units::as_units(", w.lo,", <units>) in function call\n"),
"See units::valid_udunits() for valid symbolic units."))
}
# if we are here, w.lo is 0, it has no units, and we require units
w.lo <- units::set_units(w.lo, outUnits, mode = "standard") # assign units to 0
} else if( control$requireUnits ){
# if we are here, w.lo has units and we require units, convert to the output units
w.lo <- units::set_units(w.lo, outUnits, mode = "standard")
}
if(is.null(w.hi)){
w.hi <- max(dist, na.rm=TRUE) # units flow through max automatically
} else if( !inherits(w.hi, "units") & control$requireUnits ){
stop(paste("Max distance measurement units are required.",
"Assign units by attaching 'units' package then:\n",
"units(w.hi) <- '<units>' or",
paste0("units::as_units(", w.hi,", <units>) in function call\n"),
"See units::valid_udunits() for valid symbolic units."))
} else if( control$requireUnits ){
# if we are here, w.hi has units and we require them, convert to output units
w.hi <- units::set_units(w.hi, outUnits, mode = "standard")
}
# (tlm) Add this back in when we allow strata (factors) in the smoothed dfuncs
# covars <- if (!is.empty.model(mt)){
# model.matrix(mt, mf, contrasts)
# }
ncovars <- 0
# (tlm) Add this back in when we allow strata (factors) in the smoothed dfuncs
# ncovars <- ncol(covars)
# assgn <- attr(covars,"assign")
# Truncate for w.lo and w.hi
ind <- (w.lo <= dist) & (dist <= w.hi)
dist <- dist[ind]
groupSize <- groupSize[ind]
# (tlm) Add this back in when we allow strata (factors) in the smoothed dfuncs
#covars <- covars[ind,,drop=FALSE]
# factor.names <- NULL
# if(ncovars==1){
# if( assgn==0 ){
# # constant; intercept only model
# covars <- NULL
# }
# }
# (tlm) Add this back in when we allow strata (factors) in the smoothed dfuncs
# # Find which columns are factors.
# # This works when covars is NULL and must be called
# # even when ncovars == 1 to cover case like dist ~ -1+x (no intercept)
# for(i in 1:ncol(mf)){
# if(class(mf[,i]) == "factor"){
# factor.names <- c(factor.names, names(mf)[i])
# }
# }
# vnames<-dimnames(covars)[[2]]
# Stop if dist vector contains NAs
if(any(is.na(dist))) stop("Please remove detections for which dist is NA.")
# Do the smoothing:
# Reflect distances about w.lo
# Only reflect values > w.lo (not == w.lo) because we don't want
# to double the number of points at exactly w.lo
reflectedDist <- c(-dist[dist>w.lo]+2*w.lo,dist)
dsmu <- stats::density(reflectedDist
, bw=bw
, adjust=adjust
, kernel=kernel
, from=units::drop_units(w.lo)
, to=units::drop_units(w.hi)
)
# Make sure none of the y-values are < 0
dsmu$y[dsmu$y < 0] <- 0
# Internal sample size for smooth is 2X too big because of reflection
dsmu$n <- dsmu$n / 2
# Store some smoothing stuff in density smooth
dsmu$call[["x"]] <- as.character(mt[[2]])
dsmu$call[["bw"]] <- bw
dsmu$call[["adjust"]] <- adjust
dsmu$call[["kernel"]] <- kernel
dsmu$call[["from"]] <- w.lo
dsmu$call[["to"]] <- w.hi
param.df <- data.frame(x=dsmu$x, y=dsmu$y)
ans <- list(parameters = param.df,
like.form = "smu",
expansions = 0,
covars = NULL,
w.lo = w.lo,
w.hi = w.hi,
detections = data.frame(dist, groupSize),
fit = dsmu,
covars = NULL,
#factor.names = factor.names,
call = cl,
call.x.scl = x.scl,
call.g.x.scl = g.x.scl,
call.observer = observer,
pointSurvey = pointSurvey,
outputUnits = outUnits,
formula = formula)
L <- smu.like(ans$parameters, dist,
w.lo=w.lo, w.hi=w.hi,
pointSurvey=pointSurvey,
scale=TRUE)
# shouldn't have any negatives here
# could have values outside w.lo-w.hi
ans$loglik <- -sum(log(L),na.rm=TRUE)
# Assemble results
class(ans) <- "dfunc"
gx <- F.gx.estim(ans)
ans$x.scl <- gx$x.scl
ans$g.x.scl <- gx$g.x.scl
ans
} # end function
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.