auk_unique: Remove duplicate group checklists

Description Usage Arguments Details Value Examples

Description

eBird checklists can be shared among a group of multiple observers, in which case observations will be duplicated in the database. This functions removes these duplicates from the eBird Basic Dataset (EBD) or the EBD sampling event data (with checklists_only = TRUE), creating a set of unique bird observations. This function is called automatically by read_ebd() and read_sampling().

Usage

1
2
3
auk_unique(x, group_id = "group_identifier",
  checklist_id = "sampling_event_identifier",
  species_id = "scientific_name", checklists_only = FALSE)

Arguments

x

data.frame; the EBD data frame, typically as imported by read_ebd().

group_id

character; the name of the group ID column.

checklist_id

character; the name of the checklist ID column, each checklist within a group will get a unique value for this field. The record with the lowest checklist_id will be picked as the unique record within each group.

species_id

character; the name of the column identifying species uniquely. This is required to ensure that removing duplicates is done independently for each species. Note that this will not treat sub-species independently and, if that behavior is desired, the user will have to generate a column uniquely identifying species and subspecies and pass that column's name to this argument.

checklists_only

logical; whether the dataset provided only contains checklist information as with the sampling event data file. If this argument is TRUE, then the species_id argument is ignored and removing of duplicated is done at the checklist level not the species level.

Details

This function chooses the checklist within in each that has the lowest value for the field specified by checklist_id. A new column is also created, checklist_id, whose value is the taken from the field specified in the checklist_id parameter for non-group checklists and from the field specified by the group_id parameter for grouped checklists.

Value

A data.frame with unique observations, and an additional field, checklist_id, which is a combination of the sampling event and group IDs.

Examples

1
2
3
4
5
6
7
# read in an ebd file and don't automatically remove duplicates
ebd <- system.file("extdata/ebd-sample.txt", package = "auk") %>%
  read_ebd(unique = FALSE)
# remove duplicates
ebd_unique <- auk_unique(ebd)
nrow(ebd)
nrow(ebd_unique)

mstrimas/auk documentation built on May 20, 2019, 5:26 p.m.