GreenLightScribe: QA/QC checks for standardized aerial survey data

View source: R/greenlightscribe.R

GreenLightScribeR Documentation

QA/QC checks for standardized aerial survey data

Description

GreenLightScribe will take input data, check for errors, create a summary report, and optionally output a new data file.

Usage

GreenLightScribe(
  path.name,
  area,
  report = TRUE,
  scribe2analysis = FALSE,
  archive.dir = "default"
)

Arguments

path.name

The path to the raw data file to be checked.

area

The area code for dedicated MBM Alaska region surveys. Acceptable values include:

  • YKD - Yukon Kuskokwim Delta MBM duck stratification

  • WBPHS - Waterfowl Breeding Population Habitat Survey ("North American")

  • YKG - Yukon Kuskokwim Delta MBM goose stratification

  • BLSC - Black Scoter

  • ACP - Arctic Coastal Plain

  • CRD - Copper River Delta

  • VIS - Aircraft Visibility

report

TRUE or FALSE, should a preliminary report be generated?

scribe2analysis

TRUE or FALSE, should the archive version of the raw data file be generated (if possible)?

archive.dir

Path to the desired archive directory for the new data file. Defaults to 3 directory levels above the input file to follow current MBM practices.

Details

GreenLightScribe is designed to automate common data QA/QC functions that are normally done live by an observer. These checks run on raw Scribe data starting in 2023. In the future, these checks will use a specific set of fields derived from a generic aerial survey protocol, but this protocol does not currently exist, so instead the checks provided here are simply to streamline the creation of population estimates using this R package and provide some consistency in the data fields before a data set is shared with others. The specific tests performed are:

  1. Column Names - Are all required columns present and named correctly? Necessary columns are:

    Species

    4 character string representing the acceptable species code for an observation. See sppntable

    Count

    up to 5 digit integer representing the number seen

    Grouping

    character string representing observation type:

    1. single - one lone drake (dimorphic species) or lone bird (monomorphic species)

    2. pair - hen and drake in close association (dimorphic species) or 2 birds in close association (monomorphic species)

    3. open - a mixed sex flock that can't be classified as single, pair, or flkdrake

    4. flkdrake - 2 or more drakes in close association

    Stratum

    character string (or numeric, treated as string) representing the stratum the observation was in (if known by the observer)

    Transect

    character string (or numeric, treated as string) of the DESIGN FILE transect number

    Segment

    character string (or numeric, treated as string) of the transect segment (if known)

    A_G_Name

    character string of air to ground segment ID

    Wind_Dir

    character string of wind direction based on 8 point cardinal/intercardinal directions

    Wind_Vel

    integer representing wind speed in knots

    Sky

    character string representing sky condition (clear, scattered, etc.)

    Behavior

    character string representing observed behavior: diving, flying, swimming, or NA

    Code

    character string representing the use of the data in analysis:

    1. use in standard index estimate

    2. use as double observer only

    3. additional data collected but not used in analysis

    Notes

    character string reserved for additional comments

    Course

    character string of flight direction (numeric as degrees, character cardinal or intercardinal directions)

    Distance

    floating decimal representing distance from the nearest transect

    Latitude

    floating decimal representing decimal degrees of latitude in WGS84 datum

    Longitude

    floating decimal representing decimal degrees of longitude in WGS84 datum

    Year

    4 digit integer representing the year of the observation

    Month

    2 digit integer representing the month of the observation

    Day

    1 or 2 digit integer representing the day of the observation

    Observer

    3 character initials of the observer (such as CJF, all capitalized, or C_F)

    Seat

    2 character representation of seat assignment; RF (right front), LF (left front), RR (right rear), or LR (left rear)

    Time

    character string representing verbose time stamp

    Altitude

    floating decimal representing plane altitude in feet

    Speed

    floating decimal representing plane speed in miles per hour (?)

    Audio File

    character string representing .wav file recording for the associated observation

    # Satellites

    2 digit integer representing the number of satellites (?)

    HDOP

    2 digit integer representing unknown (?)

  2. Area - Does the area specified have a defined QAQC process? Acceptable values include:

    • YKD - Yukon Kuskokwim Delta MBM duck stratification

    • WBPHS - Waterfowl Breeding Population Habitat Survey ("North American")

    • YKG - Yukon Kuskokwim Delta MBM goose stratification

    • BLSC - Black Scoter

    • ACP - Arctic Coastal Plain

    • CRD - Copper River Delta

    • VIS - Aircraft Visibility

  3. Swans - Are swans and swan nests appropriately recorded? Appropriate treatment is recording any observed swans separately from their nests, and nests as open 1. This applies to tundra swans (TUSW) and trumpeter swans (TRSW), and their associated nests.

  4. Obs_Type - Are all Obs_Type recorded as one of the accepted 4 codes (single, pair, open, flkdrake)?

  5. Seat - Are the observer seat codes recorded as one of the 4 acceptable upper case codes (RF, LF, RR, LR)?

  6. Species - Are the species codes correct? See sppntable.

  7. Observer - Are all observer initials the same (1 observer per data file) and upper case?

  8. Numeric columns - The columns Count, Wind_Vel, Distance, Latitude, Longitude, Year, Month, Day, Altitude, Speed, # Satellites, and HDOP must contain only numeric values.

A file is given a "red light" and deemed inappropriate for analysis if it fails checks on required columns, is in an undefined area, incorrect Obs_Types detected, unknown seat code, unrecognized species codes, multiple observers per file, or any of the numeric columns contain non-numerics.

A file is given a "yellow light" that indicates inconsistencies in the file, but with known treatments by the function, if any of several common mistakes occur. These include incorrect swan transcription, reversed seat codes (FR for front right instead of RF), use of older species codes, or lower case observer initials.

If a file sufficiently passes quality checks (receives a green light) and scribe2analysis = TRUE, a QCobs (archive quality) .csv file and associated report is produced 2 directories above the path.name specified. This is done according to the file structure proposed in current data management plans. If a file receives a yellow light and raw2analysis = TRUE, the associated report will detail the quality checks that were failed and the specific treatment by the function of the offending data fields before creating the QCobs file. If a preliminary report has already been produced, setting report = FALSE before generating a QCobs file will stop another preliminary report from being generated.

Value

None

Author(s)

Charles Frost, charles_frost@fws.gov

References

https://github.com/USFWS/AKaerial

Examples

 GreenlightScribe(path.name = "C:/DATA/MyData.csv", area = "CRD", report = TRUE, scribe2analysis = FALSE)


USFWS/AKaerial documentation built on April 3, 2025, 4:06 p.m.