dfuncSmu: Estimate a non-parametric smooth detection function from...

dfuncSmuR Documentation

Estimate a non-parametric smooth detection function from distance-sampling data

Description

Estimates a smooth detection function for line-transect perpendicular distances or point-transect radial distances.

Usage

dfuncSmu(
  formula,
  detectionData,
  siteData,
  bw = "SJ-dpi",
  adjust = 1,
  kernel = "gaussian",
  pointSurvey = FALSE,
  w.lo = units::set_units(0, "m"),
  w.hi = NULL,
  x.scl = "max",
  g.x.scl = 1,
  observer = "both",
  warn = TRUE,
  transectID = NULL,
  pointID = "point",
  outputUnits = NULL,
  length = "length",
  control = RdistanceControls()
)

Arguments

formula

A formula object (e.g., dist ~ 1). The left-hand side (before ~) is the name of the vector containing distances (perpendicular or radial). The right-hand side (after ~) must be the intercept-only model as Rdistance does not currently allow covariates in smoothed distance functions. If names in formula do not appear in detectionData, the normal scoping rules for model fitting routines (e.g., lm and glm) apply.

detectionData

A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:

  • Detection Distances: A single column containing detection distances must be specified on the left-hand side of formula.

  • Site IDs: The ID of the transect or point (i.e., the 'site') where each object or group was detected. The site ID column(s) (see argument siteID) must specify the site (transect or point) so that this data frame can be merged with siteData.

Optionally, this data frame can contain the following variables:

  • Group Sizes: The number of individuals in the group associated with each detection. If unspecified, Rdistance assumes all detections are of single individuals (i.e., all group sizes are 1).

  • When Rdistance allows detection-level covariates in some version after 2.1.1, detection-level covariates will appear in this data frame.

See example data set sparrowDetectionData). See also Input data frames below for information on when detectionData and siteData are required inputs.

siteData

A data.frame containing site (transect or point) IDs and any site level covariates to include in the detection function. Every unique surveyed site (transect or point) is represented on one row of this data set, whether or not targets were sighted at the site. See arguments transectID and pointID for an explanation of site and transect ID's.

If sites are transects, this data frame must also contain transect length. By default, transect length is assumed to be in column 'length' but can be specified using argument length.

The total number of sites surveyed is nrow(siteData). Duplicate site-level IDs are not allowed in siteData.

See Input data frames for when detectionData and siteData are required inputs.

bw

Bandwidth of the smooth, which controls smoothness. Smoothing is done by stats::density, and bw is passed straight to it's bw argument. bw can be numeric, in which case it is the standard deviation of the Gaussian smoothing kernel. Or, bw can be a character string specifying the bandwidth selection rule. Valid character string values of bw are the following:

  • "nrd0" : Silverman's 'rule-of-thumb' equal to \frac{0.9s}{1.34n^{-0.2}}, where s is the minimum of standard deviation of the distances and the interquartile range. See bw.nrd0.

  • "nrd" : The more common 'rule-of-thumb' variation given by Scott (1992). This rule uses 1.06 in the denominator of the "nrd0" bandwidth. See bw.nrd

  • "bcv" : The biased cross-validation method. See bcv.

  • "ucv" : The unbiased cross-validation method. See ucv.

  • "SJ" or "SJ-ste" : The 'solve-the-equation' bandwidth of Sheather & Jones (1991). See bw.SJ or width.SJ.

  • "SJ-dpi" (default) : The 'direct-plug-in' bandwidth of Sheather & Jones (1991). See bw.SJ or width.SJ.

adjust

Bandwidth adjustment for the amount of smooth. Smoothing is done by density, and this parameter is passed straight to it's adjust argument. In stats::density, the bandwidth used is actually adjust*bw, and inclusion of this parameters makes it easier to specify values like 'half the default' bandwidth.

kernel

Character string specifying the smoothing kernel function. This parameters is passed unmodified to stats::density. Valid values are:

  • "gaussian" : Gaussian (normal) kernel, the default

  • "rectangular" : Uniform or flat kernel

  • "triangular" : Equilateral triangular kernel

  • "epanechnikov" : the Epanechnikov kernel

  • "biweight" : the biweight kernel

  • "cosine" : the S version of the cosine kernel

  • "optcosine" : the optimal cosine kernel which is the usual one reported in the literature

Values of kernel may be abbreviated to the first letter of each string. The numeric value of bw used in the smooth is stored in the $fit component of the returned object (i.e., in returned$fit$bw).

pointSurvey

A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE). Point surveys (TRUE) have not been implemented yet.

w.lo

Lower or left-truncation limit of the distances in distance data. This is the minimum possible off-transect distance. Default is 0.

w.hi

Upper or right-truncation limit of the distances in dist. This is the maximum off-transect distance that could be observed. If left unspecified (i.e., at the default of NULL), right-truncation is set to the maximum of the observed distances.

x.scl

This parameter is passed to F.gx.estim. See F.gx.estim documentation for definition.

g.x.scl

This parameter is passed to F.gx.estim. See F.gx.estim documentation for definition.

observer

This parameter is passed to F.gx.estim. See F.gx.estim documentation for definition.

warn

A logical scalar specifying whether to issue an R warning if the estimation did not converge or if one or more parameter estimates are at their boundaries. For estimation, warn should generally be left at its default value of TRUE. When computing bootstrap confidence intervals, setting warn = FALSE turns off annoying warnings when an iteration does not converge. Regardless of warn, messages about convergence and boundary conditions are printed by print.dfunc, print.abund, and plot.dfunc, so there should be little harm in setting warn = FALSE.

transectID

A character vector naming the transect ID column(s) in detectionData and siteData. Transects can be the basic sampling unit (when pointSurvey=FALSE) or contain multiple sampling units (e.g., when pointSurvey=TRUE). For line-transects, the transectID column(s) alone is sufficient to specify unique sample sites. For point-transects, the amalgamation of transectID and pointID specify unique sampling sites. See Input data frames.

pointID

When point-transects are used, this is the ID of points on a transect. When pointSurvey=TRUE, the amalgamation of transectID and pointID specify unique sampling sites. See Input data frames.

If single points are surveyed, meaning surveyed points were not grouped into transects, each 'transect' consists of one point. In this case, set transectID equal to the point's ID and set pointID equal to 1 for all points.

outputUnits

A string giving the symbolic measurment units that results should be reported in. Any distance measurement unit in units::valid_udunits() will work. The strings for common distance symbolic units are: "m" for meters, "ft" for feet, "cm" for centimeters, "mm" for millimeters, "mi" for miles, "nmile" for nautical miles ("nm" is nano meters), "in" for inches, "yd" for yards, "km" for kilometers, "fathom" for fathoms, "chains" for chains, and "furlong" for furlongs. If outputUnits is unspecified (NULL), output units are the same as distance measurements units in data.

length

Character string specifying the (single) column in siteData that contains transect length. This is ignored if pointSurvey = TRUE.

control

A list containing optimization control parameters such as the maximum number of iterations, tolerance, the optimizer to use, etc. See the RdistanceControls function for explanation of each value, the defaults, and the requirements for this list. See examples below for how to change controls.

Details

Distances are reflected about w.lo before being passed to density. Distances exactly equal to w.lo are not reflected. Reflection around w.lo greatly improves performance of the kernel methods near the w.lo boundary where substantial non-zero probability of sighting typically exists.

Value

An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:

parameters

A data frame containing the $x and $y components of the smooth. $x is a vector of length 512 (default for density) evenly spaced points between w.lo and w.hi.

loglik

The value of the log likelihood. Specifically, the sum of the negative log heights of the smooth at observed distances, after the smoothed function has been scaled to integrate to one.

w.lo

Left-truncation value used during the fit.

w.hi

Right-truncation value used during the fit.

dist

The input vector of observed distances.

covars

NULL. Covariates are not allowed in the smoothed distance function (yet).

call

The original call of this function.

call.x.scl

The distance at which the distance function is scaled. This is the x at which g(x) = g.x.scl. Normally, call.x.scl = 0.

call.g.x.scl

The value of the distance function at distance call.x.scl. Normally, call.g.x.scl = 1.

call.observer

The value of input parameter observer.

fit

The smoothed object returned by stats::density. All information returned by stats::density is preserved, and in particular the numeric value of the bandwidth used during the smooth is returned in fit$bw

pointSurvey

The input value of pointSurvey. This is TRUE if distances are radial from a point. FALSE if distances are perpendicular off-transect.

formula

The formula specified for the detection function.

Input data frames

To save space and to easily specify sites without detections, all site ID's, regardless whether a detection occurred there, and site level covariates are stored in the siteData data frame. Detection distances and group sizes are measured at the detection level and are stored in the detectionData data frame.

Data frame requirements

The following explains conditions under which various combinations of the input data frames are required.

  1. Detection data and site data both required:
    Both detectionData and siteData are required if site level covariates are specified on the right-hand side of formula. Detection level covariates are not currently allowed.

  2. Detection data only required:
    The detectionData data frame alone can be specified if no covariates are included in the distance function (i.e., right-hand side of formula is "~1"). Note that this routine (dfuncEstim) does not need to know about sites where zero targets were detected, hence siteData can be missing when no covariates are involved.

  3. Neither detection data nor site data required
    Neither detectionData nor siteData are required if all variables specified in formula are within the scope of this routine (e.g., in the global working environment). Scoping rules here work the same as for other modeling routines in R such as lm and glm. Like other modeling routines, it is possible to mix and match the location of variables in the model. Some variables can be in the .GlobalEnv while others are in either detectionData or siteData.

Relationship between data frames (transect and point ID's)

The input data frames, detectionData and siteData, must be merge-able on unique sites. For line-transects, site ID's (i.e., transect ID's) are unique values of the transectID column in siteData. In this case, the following merge must work: merge(detectionData,siteData,by=transectID). For point-transects, site ID's (i.e., point ID's) are unique values of the combination paste(transectID,pointID). In this case, the following merge must work: merge(detectionData,siteData,by=c(transectID, pointID).

By default,transectID and pointID are NULL and the merge is done on all common columns. That is, when transectID is NULL, this routine assumes unique transects are specified by unique combinations of the common variables (i.e., unique values of intersect(names(detectionData), names(siteData))).

An error occurs if there are no common column names between detectionData and siteData. Duplicate site IDs are not allowed in siteData. If the same site is surveyed in multiple years, specify another transect ID column (e.g., transectID = c("year","transectID")). Duplicate site ID's are allowed in detectionData.

To help explain the relationship between data frames, bear in mind that during bootstrap estimation of variance in abundEstim, unique transects (i.e., unique values of the transect ID column(s)), not detections or points, are resampled with replacement.

References

Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.

Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley.

Sheather, S. J. and Jones, M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society series B, 53, 683-690.

Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.

See Also

abundEstim, autoDistSamp, dfuncEstim for the parametric version.

Examples

# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
data(sparrowSiteData)

# Compare smoothed and half-normal detection function
dfuncSmu <- dfuncSmu(dist~1, sparrowDetectionData, w.hi=units::set_units(150, "m"))
dfuncHn  <- dfuncEstim(formula=dist~1,sparrowDetectionData,w.hi=units::set_units(150, "m"))

# Print and plot results
dfuncSmu
dfuncHn
plot(dfuncSmu,main="",nbins=50)

x <- seq(0,150,length=200)
y <- dnorm(x, 0, predict(dfuncHn)[1])
y <- y/y[1]
lines(x,y, col="orange", lwd=2)
legend("topright", legend=c("Smooth","Halfnorm"), 
  col=c("red","orange"), lwd=2)


tmcd82070/Rdistance documentation built on April 10, 2024, 10:20 p.m.