format_zero_fill: Zero-fill data

View source: R/helper.R

format_zero_fillR Documentation

Zero-fill data

Description

Zero-fill the species presence data by adding zero observation counts (absences) data to an existing naturecounts dataset.

Usage

format_zero_fill(
  df_db,
  by = "SamplingEventIdentifier",
  species = "all",
  fill = "ObservationCount",
  extra_species = NULL,
  extra_event = NULL,
  warn = TRUE,
  verbose = TRUE
)

Arguments

df_db

Either data frame or a connection to database with naturecounts table (a data frame is returned).

by

Character vector. By default, "SamplingEventIdentifier" or a vector of specific column names to fill by (see details)

species

Character vector. Either "all", for species in the data, or a vector of species ID codes to fill in.

fill

Character. The column name to fill in. Defaults to "ObservationCount".

extra_species

Character vector. Extra columns/fields uniquely associated with species_id to keep in the data (all columns not in by, species, fill, extra_species, or extra_event will be omitted from the result).

extra_event

Character vector. Extra columns/fields uniquely associated with the Sampling Event (the field defined by by) to keep in the data (all columns not in by, species, fill, extra_species, or extra_event) will be omitted from the result).

warn

Logical. If TRUE, stop zero-filling if >100 species and >1000 unique sampling events. If FALSE, ignore and proceed.

verbose

Logical. Show messages?

Details

by refers to the combination of columns which are used to detect missing values. By default SamplingEventIdentifier is used. Otherwise users can specify their own combination of columns.

If species is supplied, all records will be used to determine observation events, but only records (zero-filled or otherwise) which correspond to a species in species will be returned (all others will be omitted). Note that records where species_id is NA (generally for 0 counts for presence/absence), will be converted to a list of 0's for the individual species.

Value

Data frame

Examples

# Download data (with "core" fields to include 'CommonName')
sample <- nc_data_dl(collection = c("SAMPLE1", "SAMPLE2"), fields_set = "core",
                     username = "sample", info = "nc_example")

# Remove casual observations (i.e. 'AllSpeciesReported' = "No")
library(dplyr) # For filter function
sample <- filter(sample, AllSpeciesReported == "Yes")

# Remove data with "X" ObservationCount (only keep numeric obs)
sample <- filter(sample, ObservationCount != "X")

# Zero fill by all species present
sample_all_zeros <- format_zero_fill(sample)

# Zero fill only for Canada Goose
goose <- format_zero_fill(sample, species = "230")

# Keep species-specific variables
goose <- format_zero_fill(sample, species = "230", extra_species = "CommonName")

# Keep sampling-event-specific variables
coords <- format_zero_fill(sample, extra_event = c("latitude", "longitude"))

# By species, keeping extra species variables and event variables
goose_coords <- format_zero_fill(sample, species = "230",
                                 extra_species = "CommonName",
                                 extra_event = c("latitude", "longitude"))

# Only return event information
events <- format_zero_fill(sample, fill = NA,
                           extra_event = c("latitude", "longitude"))



BirdStudiesCanada/naturecounts documentation built on June 30, 2023, 1:59 a.m.