ds: Fit detection functions and calculate abundance from line or...

View source: R/ds.R

dsR Documentation

Fit detection functions and calculate abundance from line or point transect data


This function fits detection functions to line or point transect data and then (provided that survey information is supplied) calculates abundance and density estimates. The examples below illustrate some basic types of analysis using ds().


  truncation = ifelse(is.null(cutpoints), ifelse(is.null(data$distend),
    max(data$distance), max(data$distend)), max(cutpoints)),
  transect = "line",
  formula = ~1,
  key = c("hn", "hr", "unif"),
  adjustment = c("cos", "herm", "poly"),
  nadj = NULL,
  order = NULL,
  scale = c("width", "scale"),
  cutpoints = NULL,
  dht_group = FALSE,
  monotonicity = ifelse(formula == ~1, "strict", "none"),
  region_table = NULL,
  sample_table = NULL,
  obs_table = NULL,
  convert_units = 1,
  er_var = ifelse(transect == "line", "R2", "P3"),
  method = "nlminb",
  quiet = FALSE,
  debug_level = 0,
  initial_values = NULL,
  max_adjustments = 5,
  er_method = 2,
  dht_se = TRUE,
  optimizer = "both",
  winebin = NULL,



a data.frame containing at least a column called distance or a numeric vector containing the distances. NOTE! If there is a column called size in the data then it will be interpreted as group/cluster size, see the section "Clusters/groups", below. One can supply data as a "flat file" and not supply region_table, sample_table and obs_table, see "Data format", below and flatfile.


either truncation distance (numeric, e.g. 5) or percentage (as a string, e.g. "15%"). Can be supplied as a list with elements left and right if left truncation is required (e.g. list(left=1,right=20) or list(left="1%",right="15%") or even list(left="1",right="15%")). By default for exact distances the maximum observed distance is used as the right truncation. When the data is binned, the right truncation is the largest bin end point. Default left truncation is set to zero.


indicates transect type "line" (default) or "point".


formula for the scale parameter. For a CDS analysis leave this as its default ~1.


key function to use; "hn" gives half-normal (default), "hr" gives hazard-rate and "unif" gives uniform. Note that if uniform key is used, covariates cannot be included in the model.


adjustment terms to use; "cos" gives cosine (default), "herm" gives Hermite polynomial and "poly" gives simple polynomial. A value of NULL indicates that no adjustments are to be fitted.


the number of adjustment terms to fit. The default value (NULL) will select via AIC (using a sequential forward selection algorithm) up to max.adjustment adjustments (unless order is specified). A non-negative integer value will cause the specified number of adjustments to be fitted. The order of adjustment terms used will depend on the key and adjustment. For key="unif", adjustments of order 1, 2, 3, ... are fitted when adjustment = "cos" and order 2, 4, 6, ... otherwise. For key="hn" or "hr" adjustments of order 2, 3, 4, ... are fitted when adjustment = "cos" and order 4, 6, 8, ... otherwise. See Buckland et al. (2001, p. 47) for details.


order of adjustment terms to fit. The default value (NULL) results in ds choosing the orders to use - see nadj. Otherwise a scalar positive integer value can be used to fit a single adjustment term of the specified order, and a vector of positive integers to fit multiple adjustment terms of the specified orders. For simple and Hermite polynomial adjustments, only even orders are allowed. The number of adjustment terms specified here must match nadj (or nadj can be the default NULL value).


the scale by which the distances in the adjustment terms are divided. Defaults to "width", scaling by the truncation distance. If the key is uniform only "width" will be used. The other option is "scale": the scale parameter of the detection


if the data are binned, this vector gives the cutpoints of the bins. Ensure that the first element is 0 (or the left truncation distance) and the last is the distance to the end of the furthest bin. (Default NULL, no binning.) Note that if data has columns distbegin and distend then these will be used as bins if cutpoints is not specified. If both are specified, cutpoints has precedence.


should density abundance estimates consider all groups to be size 1 (abundance of groups) dht_group=TRUE or should the abundance of individuals (group size is taken into account), dht_group=FALSE. Default is FALSE (abundance of individuals is calculated).


should the detection function be constrained for monotonicity weakly ("weak"), strictly ("strict") or not at all ("none" or FALSE). See Monotonicity, below. (Default "strict"). By default it is on for models without covariates in the detection function, off when covariates are present.


data_frame with two columns:

  • Region.Label label for the region

  • Area area of the region

  • region_table has one row for each stratum. If there is no stratification then region_table has one entry with Area corresponding to the total survey area. If Area is omitted density estimates only are produced.


data.frame mapping the regions to the samples (i.e. transects). There are three columns:

  • Sample.Label label for the sample

  • Region.Label label for the region that the sample belongs to.

  • Effort the effort expended in that sample (e.g. transect length).


data.frame mapping the individual observations (objects) to regions and samples. There should be three columns:

  • object unique numeric identifier for the observation

  • Region.Label label for the region that the sample belongs to

  • Sample.Label label for the sample


conversion between units for abundance estimation, see "Units", below. (Defaults to 1, implying all of the units are "correct" already.)


encounter rate variance estimator to use when abundance estimates are required. Defaults to "R2" for line transects and "P3" for point transects. See dht2 for more information and if more complex options are required.


optimization method to use (any method usable by optim or optimx). Defaults to "nlminb".


suppress non-essential messages (useful for bootstraps etc). Default value FALSE.


print debugging output. 0=none, 1-3 increasing levels of debugging output.


a list of named starting values, see mrds-opt. Only allowed when AIC term selection is not used.


maximum number of adjustments to try (default 5) only used when order=NULL.


encounter rate variance calculation: default = 2 gives the method of Innes et al, using expected counts in the encounter rate. Setting to 1 gives observed counts (which matches Distance for Windows) and 0 uses binomial variance (only useful in the rare situation where study area = surveyed area). See dht.se for more details.


should uncertainty be calculated when using dht? Safe to leave as TRUE, used in bootdht.


By default this is set to 'both'. In this case the R optimizer will be used and if present the MCDS optimizer will also be used. The result with the best likelihood value will be selected. To run only a specified optimizer set this value to either 'R' or 'MCDS'. See mcds_dot_exe for setup instructions.


If you are trying to use our MCDS.exe optimizer on a non-windows system then you may need to specify the winebin. Please see mcds_dot_exe for more details.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


deprecated, see same argument with underscore, above.


a list with elements:

  • ddf a detection function model object.

  • dht abundance/density information (if survey region data was supplied, else NULL)


If abundance estimates are required then the data.frames region_table and sample_table must be supplied. If data does not contain the columns Region.Label and Sample.Label then the data.frame obs_table must also be supplied. Note that stratification only applies to abundance estimates and not at the detection function level. Density and abundance estimates, and corresponding estimates of variance and confidence intervals, are calculated using the methods described in Buckland et al. (2001) sections 3.6.1 and 3.7.1 (further details can be found in the documentation for dht).

For more advanced abundance/density estimation please see the dht and dht2 functions.

Examples of distance sampling analyses are available at http://examples.distancesampling.org/.

Hints and tips on fitting (particularly optimisation issues) are on the mrds-opt manual page.


Note that if the data contains a column named size, cluster size will be estimated and density/abundance will be based on a clustered analysis of the data. Setting this column to be NULL will perform a non-clustered analysis (for example if "size" means something else in your dataset).


The right truncation point is by default set to be largest observed distance or bin end point. This is a default will not be appropriate for all data and can often be the cause of model convergence failures. It is recommended that one plots a histogram of the observed distances prior to model fitting so as to get a feel for an appropriate truncation distance. (Similar arguments go for left truncation, if appropriate). Buckland et al (2001) provide guidelines on truncation.

When specified as a percentage, the largest right and smallest left percent distances are discarded. Percentages cannot be supplied when using binned data.

For left truncation, there are two options: (1) fit a detection function to the truncated data as is (this is what happens when you set left). This does not assume that g(x)=1 at the truncation point. (2) manually remove data with distances less than the left truncation distance – effectively move the centre line out to be the truncation distance (this needs to be done before calling ds). This then assumes that detection is certain at the left truncation distance. The former strategy has a weaker assumption, but will give higher variance as the detection function close to the line has no data to tell it where to fit – it will be relying on the data from after the left truncation point and the assumed shape of the detection function. The latter is most appropriate in the case of aerial surveys, where some area under the plane is not visible to the observers, but their probability of detection is certain at the smallest distance.


Note that binning is performed such that bin 1 is all distances greater or equal to cutpoint 1 (>=0 or left truncation distance) and less than cutpoint 2. Bin 2 is then distances greater or equal to cutpoint 2 and less than cutpoint 3 and so on.


When adjustment terms are used, it is possible for the detection function to not always decrease with increasing distance. This is unrealistic and can lead to bias. To avoid this, the detection function can be constrained for monotonicity (and is by default for detection functions without covariates).

Monotonicity constraints are supported in a similar way to that described in Buckland et al (2001). 20 equally spaced points over the range of the detection function (left to right truncation) are evaluated at each round of the optimisation and the function is constrained to be either always less than it's value at zero ("weak") or such that each value is less than or equal to the previous point (monotonically decreasing; "strict"). See also check.mono.

Even with no monotonicity constraints, checks are still made that the detection function is monotonic, see check.mono.


In extrapolating to the entire survey region it is important that the unit measurements be consistent or converted for consistency. A conversion factor can be specified with the convert_units argument. The values of Area in region_table, must be made consistent with the units for Effort in sample_table and the units of distance in the data.frame that was analyzed. It is easiest if the units of Area are the square of the units of Effort and then it is only necessary to convert the units of distance to the units of Effort. For example, if Effort was entered in kilometres and Area in square kilometres and distance in metres then using convert_units=0.001 would convert metres to kilometres, density would be expressed in square kilometres which would then be consistent with units for Area. However, they can all be in different units as long as the appropriate composite value for convert_units is chosen. Abundance for a survey region can be expressed as: A*N/a where A is Area for the survey region, N is the abundance in the covered (sampled) region, and a is the area of the sampled region and is in units of Effort * distance. The sampled region a is multiplied by convert_units, so it should be chosen such that the result is in the same units as Area. For example, if Effort was entered in kilometres, Area in hectares (100m x 100m) and distance in metres, then using convert_units=10 will convert a to units of hectares (100 to convert metres to 100 metres for distance and .1 to convert km to 100m units).

Data format

One can supply data only to simply fit a detection function. However, if abundance/density estimates are necessary further information is required. Either the region_table, sample_table and obs_table data.frames can be supplied or all data can be supplied as a "flat file" in the data argument. In this format each row in data has additional information that would ordinarily be in the other tables. This usually means that there are additional columns named: Sample.Label, Region.Label, Effort and Area for each observation. See flatfile for an example.

Density estimation

If column Area is omitted, a density estimate is generated but note that the degrees of freedom/standard errors/confidence intervals will not match density estimates made with the Area column present.


David L. Miller


Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L., and Thomas, L. (2001). Distance Sampling. Oxford University Press. Oxford, UK.

Buckland, S.T., Anderson, D.R., Burnham, K.P., Laake, J.L., Borchers, D.L., and Thomas, L. (2004). Advanced Distance Sampling. Oxford University Press. Oxford, UK.

See Also

flatfile, AIC.ds, ds.gof, p_dist_table, plot.ds, add_df_covar_line


# An example from mrds, the golf tee data.
tee.data <- subset(book.tee.data$book.tee.dataframe, observer==1)
ds.model <- ds(tee.data, 4)

## Not run: 
# same model, but calculating abundance
# need to supply the region, sample and observation tables
region <- book.tee.data$book.tee.region
samples <- book.tee.data$book.tee.samples
obs <- book.tee.data$book.tee.obs

ds.dht.model <- ds(tee.data, 4, region_table=region,
                   sample_table=samples, obs_table=obs)

# specify order 2 cosine adjustments
ds.model.cos2 <- ds(tee.data, 4, adjustment="cos", order=2)

# specify order 2 and 3 cosine adjustments, turning monotonicity
# constraints off
ds.model.cos23 <- ds(tee.data, 4, adjustment="cos", order=c(2, 3),
# check for non-monotonicity -- actually no problems
check.mono(ds.model.cos23$ddf, plot=TRUE, n.pts=100)

# include both a covariate and adjustment terms in the model
ds.model.cos2.sex <- ds(tee.data, 4, adjustment="cos", order=2,
                        monotonicity=FALSE, formula=~as.factor(sex))
# check for non-monotonicity -- actually no problems
check.mono(ds.model.cos2.sex$ddf, plot=TRUE, n.pts=100)

# truncate the largest 10% of the data and fit only a hazard-rate
# detection function
ds.model.hr.trunc <- ds(tee.data, truncation="10%", key="hr",

# compare AICs between these models:

## End(Not run)

Distance documentation built on July 26, 2023, 5:47 p.m.