RdistDf: RdistDf - Construct Rdistance nested data frames
In tmcd82070/Rdistance: Density and Abundance from Distance-Sampling Surveys

RdistDf

R Documentation

RdistDf - Construct Rdistance nested data frames

Description

Makes an Rdistance data frame from separate transect and detection data frames. Rdistance data frames are nested data frames with one row per transect. Detection information for each transect appears in a list-based column that itself contains a data frame. See Rdistance Data Frames.

Rdistance data frames can be constructed using calls to dplyr::nest_by and dplyr::right_jion, with subsequent attribute assignment (see Examples). This routine is a convenient wrapper for those calls.

Usage

RdistDf(
  transectDf,
  detectionDf,
  by = NULL,
  pointSurvey = FALSE,
  observer = "single",
  .detectionCol = "detections",
  .effortCol = NULL
)

Arguments

`transectDf`	A data frame with one row per transect and columns containing information about the entire transect. At a minimum, this data frame must contain the transect's ID so it can be merged with `detectionDf`, (see parameter `by`) and the amount of effort the transect represents (see parameter `.effortCol`). All detections are made on a transect, but not all transects require detections. That is, `transectDf` should contain rows, and hence ID's and lengths, of all surveyed transects, even those on which no targets were detected (so-called "zero transects"). Transect-level covariates, such as habitat type, elevation, or observer IDs, should appear as variables in this data frame.
`detectionDf`	A data frame containing detection information associated with each transect. At a minimum, each row of this data frame must contain the following: Transect IDs: The ID of the transect on which a target group was detected so that the detection data frame can be merged with `transectDf` (see parameter `by`). Multiple detections on the same transect are possible and hence multiple rows in `detectonDf` can contain the same transect ID. Detection Distances: The distance at which each detection was made. The distance column will eventually be specified on the left-hand side of `formula` in a call to `dfuncEstim`. As of Rdistance version 3.0.0, detection distances must have physical measurement units assigned. See Measurement Units. Optional columns in 'detectionDf': Group sizes:If sighted targets vary in size, or group sizes are not all 1, `detectionDf` must also contain a column specifying group sizes. Non-unity group sizes are specified using `+groupsize(columnName)` on the right-hand-side of `formula` in an eventual call to `dfuncEstim`. Detection Level Covariates: Such as sex, color, habitat, etc.
`by`	A character vector of variables to use in the join. The right-hand side of this join identifies unique transects (unique rows) in both `transectDf` and the output (see warning in Details). If NULL, the join will be 'natural', using all common variables in `transectDf` and `detectionDf`. To join on specific variables, specify a character vector of the variables. For example, by = c("a", "b") joins `transectDf$a` to `detectionDf$a` and `transectDf$b` to `detectionDf$b`. If join variable names differ between `transectDf` and `detectionDf`, use a named character vector like by = c("a" = "b", "c" = "d") which joins `transectDf$a` to `detectionDf$b` and `transectDf$c` to `detectionDf$d`.
`pointSurvey`	If TRUE, observations were made from discrete points (i.e., during a point-transect survey) and distances are radial from observation point to target. If FALSE, observations were made along a continuous transect (i.e., during a line-transect survey) and distances are from target to nearest point on the transect (i.e., perpendicular to transect).
`observer`	Type of observer system. Legal values are "single" for single observer systems, "1given2" for a double observer system wherein observations made by observer 1 are tested against observations made by observer 2, "2given1" for a double observer system wherein observations made by observer 2 are tested against observations made by observer 1, and "both" for a double observer system wherein observations made by both observers are tested against the other and combined.
`.detectionCol`	Name of the list column that will contain detection data frames. Default name is "detections". Detection distances (LHS of 'dfuncEstim' formula) and group sizes are normally columns in the nested detection data frames embedded in '.detectionCol'.
`.effortCol`	For continuous line transects, this specifies the name of a column in `transectDf` containing transect lengths, which must have measurement units. For point transects, this specifies the name of a column containing the number of points on each transect. The effort column for point transects cannot contain measurement units. Default is "length" for line-transects, "numPoints" for point-transects. If those names are not found, the first column in the merged data frame whose name contains 'point' (for point transects) or 'length' (for line transects) is used and a message is printed. Matching is case insensitive, so for example, 'nPoints' and 'N_point' and 'numberOfPoints' will all be matched. If two or more column names match the effort column search terms, a warning is issued. See Transect Lengths for a description of point and line transects.

Details

For valid bootstrap estimates of confidence intervals (computed in abundEstim), each row of the nested data frame must represent one transect (more generally, one sampling unit), and none should be duplicated. The combination of transect columns in by (i.e., the LHS of the merge, or "a" and "b" of by = c("a" = "d", "b" = "c") for example) should specify unique transects and unique rows of transectDf. Warning: If by does not specify unique rows of transectDf, dplyr::left_join, which is called internally, will perform a many-to-many merge without warning, and this normally duplicates both transects and detections.

Value

A nested tibble (a generalization of base data frames) with one row per transect, and detection information in a list column. Technically, the return is a grouped tibble from the tibble package with one row per group, and a list column containing detection information. Survey type, observer system, and name of the effort column are recorded as attributes (transType, obsType, and effortColumn, respectfully). The return prints nicely using methods in package tibble. If returned objects print strangely, attach library tibble. A summary method tailored to distance sampling is available (i.e., summary(return)).

Rdistance Data Frames

RdistDf data frames contain the following information:

Transect Information: Each row of the data frame contains transect id and effort. Effort is transect length for line-transects, and number of points for point-transects. Optionally, transect-level covariates (such as habitat or observer id) appear on each row.
Detection Information: Observation distances (either perpendicular off-transect or radial off-point) appear in a data frame stored in a list column. If detected groups occasionally included more than one target, a group size column must be present in the list-column data frame. Optionally, detection-level covariates (such as sex or size) can appear in the data frame of the list column.
Distance Type: The type of observation distances, either perpendicular off-transect (for line-transects studies) or radial off-point (for point-transect studies) must appear as an attribute of RdistDf's.
Observer Type: The type of observation system used, either single observer or one of three types of multiple observer systems, must appear as an attribute of RdistDf's.

Transect Lengths

Line-transects are continuous paths with targets detectable at any point. Point transects consist of one or more discrete points along a path from which observers search for targets. The length of a line-transect is it's physical length (e.g., km or miles). The 'length' of a point transect is the number of points along the transect. Single points are considered transects of length one. The length of line-transects must have a physical measurement unit (e.g., 'm' or 'ft'). The length of point-transects must be a unit-less integers (i.e., number of points).

Measurement Units

As of Rdistance version 3.0.0, measurement units are require on all physical distances. Requiring units ensures that internal calculations and results (e.g., ESW and abundance) are correct and that output units are clear. Physical distances are required on off-transect distances, radial distances, truncation distances (w.lo, unless it is zero; and w.hi, unless it is NULL), scale locations (x.scl, unless it is zero), line-transect lengths, and study area size. All units are 1-dimensional except those on study area, which are 2-dimensional.

Physical measurement units can vary. For example, off-transect distances can be meters ("m"), w.hi can be inches ("in"), and w.lo can be kilometers ("km"). Internally, all distances are converted to the units specified by outputUnits (or the units of input distances if outputUnits is NULL), and all output is reported in units of outputUnits. Valid conversions must exist between units or an error is thrown. For example, meters cannot be converted into hectares.

Measurement units can be assigned using units()<- after attaching the units package or with x <- units::set_units(x, "<units>"). See units::valid_udunits() for a list of valid symbolic units.

If measurements are truly unit-less, or measurement units are unknown, set options(Rdist_requireUnits = FALSE). This suppresses all unit checks and conversions. Users are on their own to make sure inputs are scaled correctly and that output units are known.

Examples

data(sparrowSiteData)
data(sparrowDetectionData)

sparrowDf <- RdistDf( sparrowSiteData, sparrowDetectionData )
is.RdistDf(sparrowDf, verbose = T)
summary(sparrowDf)
summary(sparrowDf
      , formula = dist ~ groupsize(groupsize)
      , w.hi = units::set_units(100, "m"))

# Equivalent to above: 
sparrowDf <- sparrowDetectionData |> 
  dplyr::nest_by( siteID
               , .key = "detections") |> 
  dplyr::right_join(sparrowSiteData, by = "siteID") 
attr(sparrowDf, "detectionColumn") <- "detections"
attr(sparrowDf, "effortColumn") <- "length"
attr(sparrowDf, "obsType") <- "single"
attr(sparrowDf, "transType") <- "line"
is.RdistDf(sparrowDf, verbose = T)
summary(sparrowDf, formula = dist ~ groupsize(groupsize))

# Condensed view: 1 row per transect (make sure tibble is attached)
sparrowDf

# Expansion methods:
# (1) use Rdistance::unnest (includes zero transects)
df1 <- unnest(sparrowDf)
any( df1$siteID == "B2" )  # TRUE

# Use tidyr::unnest(); but, no zero transects
df2 <- tidyr::unnest(sparrowDf, cols = "detections")
any( df2$siteID == "B2" )  # FALSE

# Use dplyr::reframe for specific transects (e.g., for transect "B3")
sparrowDf |> 
  dplyr::filter(siteID == "B3") |>
  dplyr::reframe(detections)
  
# Count detections per transect (can't use dplyr::if_else)
df3 <- sparrowDf |> 
  dplyr::reframe(nDetections = ifelse(is.null(detections), 0, nrow(detections)))
sum(df3$nDetections) # Number of detections
sum(df3$nDetections == 0) # Number of zero transects
    
# Point transects
data(thrasherDetectionData)
data(thrasherSiteData)
thrasherDf <- RdistDf( thrasherSiteData
               , thrasherDetectionData
               , pointSurvey = TRUE
               , by = "siteID"
               , .detectionCol = "detections")
summary(thrasherDf, formula = dist ~ groupsize(groupsize))

tmcd82070/Rdistance documentation built on April 13, 2025, 1:38 p.m.