dfuncEstim | R Documentation |
Fit a specific detection function to off-transect or off-point (radial) distances using maximum likelihood. Distance functions are fitted to individual distance observations, not histogram bin heights, despite plot methods that draw histogram bars.
dfuncEstim(
formula,
detectionData,
siteData,
likelihood = "halfnorm",
pointSurvey = FALSE,
w.lo = units::set_units(0, "m"),
w.hi = NULL,
expansions = 0,
series = "cosine",
x.scl = units::set_units(0, "m"),
g.x.scl = 1,
observer = "both",
warn = TRUE,
transectID = NULL,
pointID = "point",
outputUnits = NULL,
control = RdistanceControls()
)
formula |
A standard formula object (e.g., Group Sizes: Non-unity group sizes are specified using |
detectionData |
A data frame containing detection distances (either perpendicular for line-transect or radial for point-transect designs), with one row per detected object or group. This data frame must contain at least the following information:
See example data set |
siteData |
A data.frame containing site (transect or point)
IDs and any
site level covariates to include in the detection function.
Every unique surveyed site (transect or point) is represented on
one row of this data set, whether or not targets were sighted
at the site. See arguments See Data frame requirements for situations in which
|
likelihood |
String specifying the likelihood to fit. Built-in likelihoods at present are "uniform", "halfnorm", "hazrate", "negexp", and "Gamma". See vignette for a way to use user-define likelihoods. |
pointSurvey |
A logical scalar specifying whether input data come from point-transect surveys (TRUE), or line-transect surveys (FALSE). |
w.lo |
Lower or left-truncation limit of the distances in distance data.
This is the minimum possible off-transect distance. Default is 0. If
|
w.hi |
Upper or right-truncation limit of the distances
in |
expansions |
A scalar specifying the number of terms
in |
series |
If |
x.scl |
The x coordinate (a distance) at which to scale the
sightability function to |
g.x.scl |
Height of the distance function at coordinate x.
The distance function
will be scaled so that g( |
observer |
A numeric scalar or text string specifying whether observer 1
or observer 2 or both were full-time observers.
This parameter dictates which set of observations form the denominator
of a double observer system.
If, for example, observer 2 was a data recorder and part-time observer,
or if observer 2 was the pilot, set |
warn |
A logical scalar specifying whether to issue
an R warning if the estimation did not converge or if one
or more parameter estimates are at their boundaries.
For estimation, |
transectID |
A character vector naming the transect ID column(s) in
|
pointID |
When point-transects are used, this is the
ID of points on a transect. When If single points are surveyed,
meaning surveyed points were not grouped into transects, each 'transect' consists
of one point. In this case, set |
outputUnits |
A string giving the symbolic measurment
units that results should be reported in. Any
distance measurement unit in |
control |
A list containing optimization control parameters such
as the maximum number of iterations, tolerance, the optimizer to use,
etc. See the
|
An object of class 'dfunc'. Objects of class 'dfunc' are lists containing the following components:
parameters |
The vector of estimated parameter values. Length of this vector for built-in likelihoods is one (for the function's parameter) plus the number of expansion terms plus one if the likelihood is either 'hazrate' or 'uniform' (hazrate and uniform have two parameters). |
varcovar |
The variance-covariance matrix for coefficients of the distance function, estimated by the inverse of the Hessian of the fit evaluated at the estimates. There is no guarantee this matrix is positive-definite and should be viewed with caution. Error estimates derived from bootstrapping are generally more reliable. |
loglik |
The maximized value of the log likelihood (more specifically, the minimized value of the negative log likelihood). |
convergence |
The convergence code. This code
is returned by |
like.form |
The name of the likelihood. This is
the value of the argument |
w.lo |
Left-truncation value used during the fit. |
w.hi |
Right-truncation value used during the fit. |
detections |
A data frame of detections within the strip
or circle used in the fit. Column 'dist' contains the
observed distances.
Column 'groupSize' contains group sizes associated with
the values of 'dist'. Group
sizes are only used in |
covars |
Either NULL if no covariates are included in the
detection function, or a |
model.frame |
A |
siteID.cols |
A vector containing the transect ID column names in |
expansions |
The number of expansion terms used during estimation. |
series |
The type of expansion used during estimation. |
call |
The original call of this function. |
call.x.scl |
The input or user requested distance at which the distance function is scaled. |
call.g.x.scl |
The |
call.observer |
The value of input parameter |
fit |
The fitted object returned by |
factor.names |
The names of any factors in |
pointSurvey |
The input value of |
formula |
The formula specified for the detection function. |
control |
A list containing values of the 'control' parameters
set by |
outputUnits |
The measurement units used for output. All distance measurements are converted to these units internally. |
x.scl |
The actual distance at which
the distance function is scaled to some value.
i.e., this is the actual x at
which g(x) = |
g.x.scl |
The actual height of the distance function
at a distance of |
Rdistance
accommodates two kinds of transects: continuous and point.
On continuous transects detections can occur at
any point along the route, and these are line-transects.
On point transects detections can only
occur at a series of stops (points), and these are
point-transects.
Transects are the basic sampling unit in both cases.
Columns named in transectID
are
sufficient to specify unique line-transects.
The combination of transectID
and
pointID
specify unique sampling locations along point-transects.
See Input data frames below for more detail.
To save space and to easily specify
sites without detections,
all site ID's, regardless of whether a detection occurred there,
and site level covariates are stored in
the siteData
data frame. Detection distances and group
sizes are measured at the detection level and
are stored in the
detectionData
data frame.
The following explains conditions under which various combinations of the input data frames are required.
Detection data and site data both required:
Both detectionData
and siteData
are required if site level covariates are
specified on the right-hand side of formula
.
Detection level covariates are not currently allowed.
Both detectionData
and
siteData
data frames are required to estimate abundance
later in abundEstim
.
Detection data only required:
detectionData
only is required when
covariates are are not included in the distance function (i.e., the right-hand side of
formula
is "~1" or "~groupsize(groupSize)"). Note that dfuncEstim
does not need to know transect IDs (or group sizes)
in order to estimate a distance function; but, group sizes and
transect IDs are stored and used to estimate abundance
in function abundEstim
. Both the detectionData
and
siteData
data frames are required in abundEstim
.
Neither detection data nor site data required
Neither detectionData
nor siteData
are required if all variables specified in formula
are within the scope of dfuncEstim
(e.g., in the global working
environment) and abundance estimates are not required.
Regular R scoping rules apply when the call
to dfuncEstim
is embedded in a function.
This case is will produce distance functions only.
Abundance cannot later be estimated because transects and transect lengths cannot
be specified outside of a data frame. If abundance will be estimated,
use either case 1 or 2.
The input data frames, detectionData
and siteData
,
must be merge-able on unique sites. For line-transects,
site ID's specify transects or routes and are unique values of
the transectID
column in siteData
. In this case,
the following merge must work:
merge(detectionData,siteData,by=transectID)
.
For point-transects,
site ID's specify individual points and are unique values
of the combination paste(transectID,pointID)
.
In this case, the following merge must work:
merge(detectionData,siteData,by=c(transectID, pointID)
.
By default, transects are unique combinations of the
common variables in the detectionData
and siteData
data frames
if both data frames are specified (i.e., unique values of
intersect(names(detectionData), names(siteData))
). If siteData
is not specified and transectID
is not given, transects are assumed to
be identified in a column named siteID
in detectionData
.
Either way
(i.e., either transectID
= "siteID" or specified as something else),
the column(s) containing transect ID's must be correct here if abundance is to be
estimated later. Routine abundEstim
requires transect ID's for bootstrapping
because it resamples unique values of the composite transect ID column(s). abundEstim
uses the value of transectID
specified here and hence users cannot change transect ID's between
calls to dfuncEstim
and abundEstim
and all transectID
columns
must be present in both data frames even though they may not be used until later.
An error occurs if both detectionData
and siteData
are specified
but no common columns exist. Duplicate transectID
values are not allowed in siteData
but are allowed in detectionData
because multiple detections can occur on a single transect
or at a single site. If the same site is surveyed in multiple years, specify another level of transect ID;
for example, transectID
= c("year","transectID")
.
As of Rdistance
version 3.0.0, measurement units are
require on all distances. This includes off-transect
distances, radial
distances, truncation distances (w.lo
and w.hi
),
transect lengths, and study size area.
In dfuncEstim
, units are required on the following:
detectionData$dist
; w.lo
(unless it is zero);
w.hi
(unless it is NULL);
and x.scl
. In abundEstim
, units are
required on siteData$length
and area
. All units are
1-dimensional except those on area
, which are 2-dimensional.
Requiring units ensures that internal calculations and results
(e.g., ESW and abundance) are correct
and that output units are clear.
Input distances can have variable units. For example,
input distances can be in specified in "m", w.hi
in "in",
and w.lo
in "km". Internally, all distances are
converted to the units specified by outputUnits
(or the units of input distances if
outputUnits
is NULL), and
all output is reported
in units of outputUnits
.
Measurement units can be assigned using
units()<-
after attaching the units
package or with x <- units::set_units(x, "<units>")
.
See units::valid_udunits()
for a list of valid symbolic units.
If measurements are truly unit-less, or measurement units are unknown,
set RdistanceControls(requireUnits = FALSE)
. This suppresses
all unit checks and conversions. Users are on their own
to make sure inputs are scaled correctly and that output units are known.
Buckland, S.T., D.R. Anderson, K.P. Burnham, J.L. Laake, D.L. Borchers, and L. Thomas. (2001) Introduction to distance sampling: estimating abundance of biological populations. Oxford University Press, Oxford, UK.
abundEstim
, autoDistSamp
.
Likelihood-specific help files (e.g., halfnorm.like
).
See package vignettes for additional options.
# Load example sparrow data (line transect survey type)
data(sparrowDetectionData)
dfunc <- dfuncEstim(formula = dist ~ 1
, detectionData = sparrowDetectionData)
dfunc
plot(dfunc)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.