GreenLight: QA/QC checks for standardized aerial survey data

View source: R/greenlight.R

GreenLightR Documentation

QA/QC checks for standardized aerial survey data

Description

GreenLight will take input data, check for errors, create a summary report, and optionally output a new data file.

Usage

GreenLight(
  path.name,
  area,
  report = TRUE,
  raw2analysis = FALSE,
  archive.dir = "default"
)

Arguments

path.name

The path to the raw data file to be checked.

area

The area code for dedicated MBM Alaska region surveys. Acceptable values include:

  • YKD - Yukon Kuskokwim Delta MBM duck stratification

  • WBPHS - Waterfowl Breeding Population Habitat Survey ("North American")

  • YKG - Yukon Kuskokwim Delta MBM goose stratification

  • BLSC - Black Scoter

  • ACP - Arctic Coastal Plain

  • CRD - Copper River Delta

  • VIS - Aircraft Visibility

report

TRUE or FALSE, should a preliminary report be generated?

raw2analysis

TRUE or FALSE, should the archive version of the raw data file be generated (if possible)?

archive.dir

Path to the desired archive directory for the new data file. Defaults to 3 directory levels above the input file to follow current MBM practices.

Details

GreenLight is designed to automate common data QA/QC functions that are normally done live by an observer. In the future, these checks will use a specific set of fields derived from a generic aerial survey protocol, but this protocol does not currently exist, so instead the checks provided here are simply to streamline the creation of population estimates using this R package and provide some consistency in the data fields before a data set is shared with others. The specific tests performed are:

  1. Column Names - Are all required columns present and named correctly? Necessary columns are:

    Year

    4 digit integer representing the year of the observation

    Month

    2 digit integer representing the month of the observation

    Day

    1 or 2 digit integer representing the day of the observation

    Seat

    2 character representation of seat assignment; RF (right front), LF (left front), RR (right rear), or LR (left rear)

    Observer

    3 character initials of the observer (such as CJF, all capitalized, or C_F)

    Stratum

    character string (or numeric, treated as string) representing the stratum the observation was in (if known by the observer)

    Transect

    character string (or numeric, treated as string) of the DESIGN FILE transect number

    Segment

    character string (or numeric, treated as string) of the transect segment (if known)

    Flight_Dir

    character string of flight direction (numeric as degrees, character cardinal or intercardinal directions)

    A_G_Name

    character string of air to ground segment ID

    Wind_Dir

    character string of wind direction based on 8 point cardinal/intercardinal directions

    Wind_Vel

    integer representing wind speed in knots

    Sky

    character string representing sky condition (clear, scattered, etc.)

    Filename

    character string representing .wav file recording for the associated observation

    Lat

    floating decimal representing decimal degrees of latitude in WGS84 datum

    Lon

    floating decimal representing decimal degrees of longitude in WGS84 datum

    Time

    floating decimal representing computer clock seconds past midnight

    Delay

    floating decimal representing the delay in clock time and GPS system time in seconds

    Species

    4 character string representing the acceptable species code for an observation. See sppntable

    Num

    up to 5 digit integer representing the number seen

    Obs_Type

    character string representing observation type:

    1. single - one lone drake (dimorphic species) or lone bird (monomorphic species)

    2. pair - hen and drake in close association (dimorphic species) or 2 birds in close association (monomorphic species)

    3. open - a mixed sex flock that can't be classified as single, pair, or flkdrake

    4. flkdrake - 2 or more drakes in close association

    Behavior

    character string representing observed behavior: diving, flying, swimming, or NA

    Distance

    character string representing distance from the observer: near, far, NA

    Code

    integer representing the use of the data in analysis:

    1. use in standard index estimate

    2. use as double observer only

    3. additional data collected but not used in analysis

    Notes

    character string reserved for additional comments

  2. Area - Does the area specified have a defined QAQC process? Acceptable values include:

    • YKD - Yukon Kuskokwim Delta MBM duck stratification

    • WBPHS - Waterfowl Breeding Population Habitat Survey ("North American")

    • YKG - Yukon Kuskokwim Delta MBM goose stratification

    • BLSC - Black Scoter

    • ACP - Arctic Coastal Plain

    • CRD - Copper River Delta

    • VIS - Aircraft Visibility

  3. Swans - Are swans and swan nests appropriately recorded? Appropriate treatment is recording any observed swans separately from their nests, and nests as open 1. This applies to tundra swans (TUSW) and trumpeter swans (TRSW), and their associated nests.

  4. Obs_Type - Are all Obs_Type recorded as one of the accepted 4 codes (single, pair, open, flkdrake)?

  5. Seat - Are the observer seat codes recorded as one of the 4 acceptable upper case codes (RF, LF, RR, LR)?

  6. Species - Are the species codes correct? See sppntable.

  7. Observer - Are all observer initials the same (1 observer per data file) and upper case?

  8. Numeric columns - The columns Year, Month, Day, Wind_Vel, Lat, Lon, Time, Delay, Num, and Code must contain only numeric values.

A file is given a "red light" and deemed inappropriate for analysis if it fails checks on required columns, is in an undefined area, incorrect Obs_Types detected, unknown seat code, unrecognized species codes, multiple observers per file, or any of the numeric columns contain non-numerics.

A file is given a "yellow light" that indicates inconsistencies in the file, but with known treatments by the function, if any of several common mistakes occur. These include incorrect swan transcription, reversed seat codes (FR for front right instead of RF), use of older species codes, or lower case observer initials.

If a file sufficiently passes quality checks (receives a green light) and raw2analysis = TRUE, a QCobs (archive quality) .csv file and associated report is produced 2 directories above the path.name specified. This is done according to the file structure proposed in current data management plans. If a file receives a yellow light and raw2analysis = TRUE, the associated report will detail the quality checks that were failed and the specific treatment by the function of the offending data fields before creating the QCobs file. If a preliminary report has already been produced, setting report = FALSE before generating a QCobs file will stop another preliminary report from being generated.

Value

None

Author(s)

Charles Frost, charles_frost@fws.gov

References

https://github.com/USFWS/AKaerial

Examples

 Greenlight(path.name = "C:/DATA/MyData.csv", area = "CRD", report = TRUE, raw2analysis = FALSE)


USFWS/AKaerial documentation built on April 3, 2025, 4:06 p.m.