AKaerial: Analysis of Alaska Region Aerial Survey Data

Documented in ColMatch ColSwitch CommonFix GreenLight ObsByYear ObserverCheck Raw2Analysis SeatCheck ShouldBeNumeric SpeciesCheck SwanCheck SwitchMatch UnitCheck

#' QA/QC checks for standardized aerial survey data
#'
#' GreenLight will take input data, check for errors, create a summary report, and optionally output a new data file.
#'
#' GreenLight is designed to automate common data QA/QC functions that are normally done live by an observer.
#' In the future, these checks will use a specific set of fields derived from a generic aerial survey protocol, but
#' this protocol does not currently exist, so instead the checks provided here are simply to streamline the
#' creation of population estimates using this R package and provide some consistency in the data fields before
#' a data set is shared with others.  The specific tests performed are:
#' \enumerate{
#' \item Column Names - Are all required columns present and named correctly?  Necessary columns are:
#'   \describe{
#'   \item{Year}{4 digit integer representing the year of the observation}
#'   \item{Month}{2 digit integer representing the month of the observation}
#'   \item{Day}{1 or 2 digit integer representing the day of the observation}
#'   \item{Seat}{2 character representation of seat assignment; RF (right front), LF (left front), RR (right rear), or LR (left rear)}
#'   \item{Observer}{3 character initials of the observer (such as CJF, all capitalized, or C_F)}
#'   \item{Stratum}{character string (or numeric, treated as string) representing the stratum the observation was in (if known by the observer)}
#'   \item{Transect}{character string (or numeric, treated as string) of the DESIGN FILE transect number}
#'   \item{Segment}{character string (or numeric, treated as string) of the transect segment (if known)}
#'   \item{Flight_Dir}{character string of flight direction (numeric as degrees, character cardinal or intercardinal directions)}
#'   \item{A_G_Name}{character string of air to ground segment ID}
#'   \item{Wind_Dir}{character string of wind direction based on 8 point cardinal/intercardinal directions}
#'   \item{Wind_Vel}{integer representing wind speed in knots}
#'   \item{Sky}{character string representing sky condition (clear, scattered, etc.)}
#'   \item{Filename}{character string representing .wav file recording for the associated observation}
#'   \item{Lat}{floating decimal representing decimal degrees of latitude in WGS84 datum}
#'   \item{Lon}{floating decimal representing decimal degrees of longitude in WGS84 datum}
#'   \item{Time}{floating decimal representing computer clock seconds past midnight}
#'   \item{Delay}{floating decimal representing the delay in clock time and GPS system time in seconds}
#'   \item{Species}{4 character string representing the acceptable species code for an observation.  See \code{\link{sppntable}}}
#'   \item{Num}{up to 5 digit integer representing the number seen}
#'   \item{Obs_Type}{character string representing observation type: \enumerate{
#'   \item single - one lone drake (dimorphic species) or lone bird (monomorphic species)
#'   \item pair - hen and drake in close association (dimorphic species) or 2 birds in close association (monomorphic species)
#'   \item open - a mixed sex flock that can't be classified as single, pair, or flkdrake
#'   \item flkdrake - 2 or more drakes in close association}}
#'   \item{Behavior}{character string representing observed behavior: diving, flying, swimming, or NA}
#'   \item{Distance}{character string representing distance from the observer: near, far, NA}
#'   \item{Code}{integer representing the use of the data in analysis: \enumerate{
#'   \item use in standard index estimate
#'   \item use as double observer only
#'   \item additional data collected but not used in analysis}}
#'   \item{Notes}{character string reserved for additional comments}
#'   }
#' \item Area - Does the area specified have a defined QAQC process? Acceptable values include: \itemize{
#'   \item YKD - Yukon Kuskokwim Delta MBM duck stratification
#'   \item WBPHS - Waterfowl Breeding Population Habitat Survey ("North American")
#'   \item YKG - Yukon Kuskokwim Delta MBM goose stratification
#'   \item BLSC - Black Scoter
#'   \item ACP - Arctic Coastal Plain
#'   \item CRD - Copper River Delta
#'   \item VIS - Aircraft Visibility}
#' \item Swans - Are swans and swan nests appropriately recorded?  Appropriate treatment is recording any observed swans separately from their nests, and nests as open 1.
#'  This applies to tundra swans (TUSW) and trumpeter swans (TRSW), and their associated nests.
#' \item Obs_Type - Are all Obs_Type recorded as one of the accepted 4 codes (single, pair, open, flkdrake)?
#' \item Seat - Are the observer seat codes recorded as one of the 4 acceptable upper case codes (RF, LF, RR, LR)?
#' \item Species - Are the species codes correct?  See \code{\link{sppntable}}.
#' \item Observer - Are all observer initials the same (1 observer per data file) and upper case?
#' \item Numeric columns - The columns Year, Month, Day, Wind_Vel, Lat, Lon, Time, Delay, Num, and Code must contain only numeric values.
#' }
#'
#' A file is given a "red light" and deemed inappropriate for analysis if it fails checks on required columns, is in an undefined area,
#' incorrect Obs_Types detected, unknown seat code, unrecognized species codes, multiple observers per file, or any of the numeric columns contain non-numerics.
#'
#' A file is given a "yellow light" that indicates inconsistencies in the file, but with known treatments by the function, if any of several common mistakes occur.
#' These include incorrect swan transcription, reversed seat codes (FR for front right instead of RF), use of older species codes, or lower case observer initials.
#'
#' If a file sufficiently passes quality checks (receives a green light) and raw2analysis = TRUE, a QCobs (archive quality) .csv file and associated report is produced 2 directories above the path.name specified.
#' This is done according to the file structure proposed in current data management plans.  If a file receives a yellow light and raw2analysis = TRUE, the associated report will detail the
#' quality checks that were failed and the specific treatment by the function of the offending data fields before creating the QCobs file.  If a preliminary report has already been produced, setting
#' report = FALSE before generating a QCobs file will stop another preliminary report from being generated.
#'
#'
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param path.name The path to the raw data file to be checked.
#' @param area The area code for dedicated MBM Alaska region surveys.
#'    Acceptable values include:
#'  \itemize{
#'  \item YKD - Yukon Kuskokwim Delta MBM duck stratification
#'  \item WBPHS - Waterfowl Breeding Population Habitat Survey ("North American")
#'  \item YKG - Yukon Kuskokwim Delta MBM goose stratification
#'  \item BLSC - Black Scoter
#'  \item ACP - Arctic Coastal Plain
#'  \item CRD - Copper River Delta
#'  \item VIS - Aircraft Visibility
#'  }
#' @param report TRUE or FALSE, should a preliminary report be generated?
#' @param raw2analysis TRUE or FALSE, should the archive version of the raw data file be generated (if possible)?
#' @param archive.dir Path to the desired archive directory for the new data file.  Defaults to 3 directory levels above the input file to follow current MBM practices.
#'
#' @return None
#'
#' @examples
#'  Greenlight(path.name = "C:/DATA/MyData.csv", area = "CRD", report = TRUE, raw2analysis = FALSE)
#'
#' @export
GreenLight=function(path.name, area, report=TRUE, raw2analysis=FALSE, archive.dir = "default"){

  can.fix=TRUE

  necessary=c("Year",
              "Month",
              "Day",
              "Seat",
              "Observer",
              "Stratum",
              "Transect",
              "Segment",
              "Flight_Dir",
              "A_G_Name",
              "Wind_Dir",
              "Wind_Vel",
              "Sky",
              "Filename",
              "Lat",
              "Lon",
              "Time",
              "Delay",
              "Species",
              "Num",
              "Obs_Type",
              "Behavior",
              "Distance",
              "Code",
              "Notes")


  if(area=="VIS"){necessary=c("Year",
                              "Month",
                              "Day",
                              "Seat",
                              "Observer",
                              "Stratum",
                              "Transect",
                              "Segment",
                              "Trans_Num",
                              "Flight_Dir",
                              "Area",
                              "Aircraft",
                              "Wind_Vel",
                              "Sky",
                              "Filename",
                              "Lat",
                              "Lon",
                              "Time",
                              "Delay",
                              "Species",
                              "Num",
                              "Obs_Type",
                              "Behavior",
                              "Distance",
                              "Code",
                              "Notes")
    }

  type <- tools::file_ext(path.name)

  if(type == "txt") {data <- read.table(path.name, header=TRUE, stringsAsFactors = FALSE, na.strings=c("NA", "na", "N/A"))}
  if(type == "csv") {data <- read.csv(path.name, header=TRUE, stringsAsFactors = FALSE, na.strings=c("NA", "na", "N/A"))}

  if(!area %in% c("ACP", "YKD", "YKG", "CRD", "WBPHS", "BLSC", "VIS")){

    stop("Area not supported or incorrect.  Currently supported areas are ACP, YKD, YKG, CRD, VIS, WBPHS.")

  }


  #are the columns there that need to be renamed?
  test.switchmatch=SwitchMatch(data)

  if(test.switchmatch==TRUE){
    s.switchmatch="green"
  }else{s.switchmatch="red"}



  #are all of the required columns there (and named correctly)?
  test.colmatch=ColMatch(data, necessary=necessary)

  if(test.colmatch==TRUE){s.colmatch="green"}else{s.colmatch="red"}


  #are there any swan nests without associated records (missing) or swan nests recorded as
  #anything but open?

  test.swan=SwanCheck(data)

  if(test.swan$fail==TRUE){s.swan="red"}else{s.swan="green"}


  #are there only 4 possible unit types? (Obs_type = single, pair, open, flkdrake)

  test.unit=UnitCheck(data)

  if(test.unit$fail==TRUE){s.unit="red"}else{s.unit="green"}


  #are there only 4 possible seats recorded? (Seat = RF, RR, LF, LR)

  test.seat=SeatCheck(data)

  if(test.seat$fail==TRUE){s.seat="red"}else{s.seat="green"}

  #are the species codes correct?

  test.species=SpeciesCheck(data, area=area)

  s.species="test"

  if(length(test.species$change[,1])>0){s.species="yellow"}

  if(length(test.species$bad[,1])>0){s.species="red"}

  if(test.species$fail==FALSE & s.species!="yellow"){s.species="green"}

  #are the observer initials all uppercase?

  test.observer=ObserverCheck(data)

  if(test.observer==TRUE){s.observer="green"}else{s.observer="red"}


  #one observer per file?

  test.oneobserver=(length(unique(data$Observer))==1)

  if(test.oneobserver==TRUE){s.oneobserver="green"}else{s.oneobserver="red"}


  #Numeric tests- Year, Month, Day, Wind_Vel, Lat, Lon, Time, Delay, Num

  test.year=ShouldBeNumeric(data$Year)
  if(test.year$fail==TRUE){s.year="red"}else{s.year="green"}

  test.month=ShouldBeNumeric(data$Month)
  if(test.month$fail==TRUE){s.month="red"}else{s.month="green"}

  test.day=ShouldBeNumeric(data$Day)
  if(test.day$fail==TRUE){s.day="red"}else{s.day="green"}

  test.wind=ShouldBeNumeric(data$Wind_Vel)
  if(test.wind$fail==TRUE){s.wind="red"}else{s.wind="green"}

  test.lat=ShouldBeNumeric(data$Lat)
  if(test.lat$fail==TRUE){s.lat="red"}else{s.lat="green"}

  test.lon=ShouldBeNumeric(data$Lon)
  if(test.lon$fail==TRUE){s.lon="red"}else{s.lon="green"}

  test.time=ShouldBeNumeric(data$Time)
  if(test.time$fail==TRUE){s.time="red"}else{s.time="green"}

  test.delay=ShouldBeNumeric(data$Delay)
  if(test.delay$fail==TRUE){s.delay="red"}else{s.delay="green"}

  test.num=ShouldBeNumeric(data$Num)
  if(test.num$fail==TRUE){s.num="red"}else{s.num="green"}

  # test.code=ShouldBeNumeric(data$Code)
  # if(test.code$fail==TRUE){s.code="red"}else{s.code="green"}



  ## E Osnas mapping function

  if(area != "WBPHS"){

  sf.obs <- data %>% sf::st_as_sf(coords=c("Lon", "Lat"), crs=4326) %>%
    sf::st_transform(crs=3338) %>%
    filter(Code == 1) #remove any non-survey or special observations

  sf.lines <- sf.obs %>%
    sf::st_transform(crs=4326) %>%
    group_split(Transect, Day) %>%
    purrr::map(points2line) %>%
    purrr::map_dfr(rbind)

  if(area=="ACP"){basemap="K:/Waterfowl/ACP_Survey/Design_Files/Design_Strata/ACP_2007to2018_DesignStrata.shp"}
  if(area %in% c("YKD", "YKG", "YKDV")){basemap="K:/Waterfowl/YKD_Coastal/Design_Files/Design_Strata/YK_DesignStrata.shp"}
  if(area=="CRD"){basemap="K:/Waterfowl/CRD_Survey/Design_Files/Design_Strata/CRD_2018_DesignStrata.shp"}

  basemap <- sf::st_read(dsn=basemap) %>%
    sf::st_transform(crs=3338)

  sf.obs <- sf.obs %>%  mutate(Day=as.character(Day))

}

  ## End mapping, plot the object in markdown

  rmd.path=system.file("rmd/greenlight.Rmd", package="AKaerial")

 if(report == TRUE){
 rmd.path=system.file("rmd/greenlight.Rmd", package="AKaerial")

 rmarkdown::render(rmd.path, output_dir=dirname(path.name), output_file=paste(basename(tools::file_path_sans_ext(path.name)), "_QAQC_", Sys.Date(), ".html", sep=''))
 }

  if(any("red" %in% c(s.num, s.delay, s.time, s.lon, s.lat, s.wind, s.day, s.month, s.year, s.oneobserver, s.species, s.colmatch, s.unit))){
    can.fix=FALSE
    print("can't fix")
    print(c(s.num, s.delay, s.time, s.lon, s.lat, s.wind, s.day, s.month, s.year, s.oneobserver, s.species, s.colmatch, s.unit))}

  if(can.fix==TRUE & raw2analysis==TRUE){

    rmd.path=system.file("rmd/raw2analysis.Rmd", package="AKaerial")

    data.obj=Raw2Analysis(data, area=area)

    data.new=data.obj$newdata
    data.new=data.new[data.new$Species != "XXXX",]
    fix=data.obj$fix

    if(area != "VIS"){
    proj=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][1]
    yr=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][2]
    type="QCObs"
    name=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][4]
    #fix initials cut off on initials containing "_"
    #name=substr(basename(tools::file_path_sans_ext(path.name)), nchar(basename(tools::file_path_sans_ext(path.name)))-2, nchar(basename(tools::file_path_sans_ext(path.name))))

    if(archive.dir != "default"){
    write.path=paste(archive.dir, "/", proj, "_", yr, "_QCObs_", name, ".csv", sep='')
    }else{write.path=paste(dirname(dirname(dirname(path.name))), "/", proj, "_", yr, "_QCObs_", name, ".csv", sep='')}

    }

    data.new$Notes=gsub(",","", data.new$Notes)

    if(area=="VIS"){
      proj=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][1]
      yr=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][2]
      region=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][3]
      aircraft=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][4]
      name=strsplit(basename(tools::file_path_sans_ext(path.name)),"_")[[1]][6]

      type="QCObs"

    if(archive.dir != "default"){
    write.path=paste(archive.dir, "/", proj, "_", yr, "_", region, "_", aircraft, "_QCObs_", name, ".csv", sep='')
    }else{write.path=paste(dirname(dirname(dirname(path.name))), "/", proj, "_", yr, "_", region, "_", aircraft, "_QCObs_", name, ".csv", sep='')}

    }


    print(write.path)

    if(area=="WBPHS"){write.csv(data.new[,1:25], write.path, quote=FALSE, row.names=FALSE, na="")
    }else{write.csv(data.new[,1:25], write.path, quote=FALSE, row.names=FALSE) }

    if(area!="VIS" & archive.dir != "default"){rmarkdown::render(rmd.path, output_dir=archive.dir, output_file=paste(proj,"_", yr, "_QCLog_",name, ".html", sep=''))}
    if(area=="VIS" & archive.dir != "default"){rmarkdown::render(rmd.path, output_dir=archive.dir, output_file=paste(proj, "_", yr, "_", region, "_", aircraft, "_QCLog_", name, ".html", sep=''))}
    if(area!="VIS" & archive.dir == "default"){rmarkdown::render(rmd.path, output_dir=dirname(dirname(dirname(path.name))), output_file=paste(proj,"_", yr, "_QCLog_",name, ".html", sep=''))}
    if(area=="VIS" & archive.dir == "default"){rmarkdown::render(rmd.path, output_dir=dirname(dirname(dirname(path.name))), output_file=paste(proj, "_", yr, "_", region, "_", aircraft, "_QCLog_", name, ".html", sep=''))}

    }


}



#' Check for the old style of transcribed data.
#'
#' SwitchMatch will test if the original style of transcribed column headings is present
#'
#' SwitchMatch is designed to check if the old style of column headings is present and the minimum set of
#' yr, se, mo, strat, da, grp, and sppn exists.  If TRUE, then ColSwitch can be used to overwrite to the current headings.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be checked.
#'
#' @return TRUE/FALSE
#'
#' @export
SwitchMatch=function(data){

  necessary=c("yr", "se", "mo", "strat", "da", "grp", "sppn")
  return(all(necessary %in% colnames(data)))

}


#' Check if observer initials are uppercase.
#'
#' ObserverCheck checks to see if the observer initials were entered consistently in uppercase
#'
#' ObserverCheck checks to see if the observer initials were entered consistently in uppercase.  This is a basic QA/QC check
#' so the analysis is not split among upper and lowercase versions of the same observer.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be checked.
#'
#' @return TRUE/FALSE
#'
#' @export
ObserverCheck=function(data){

  return(all(data$Observer==toupper(data$Observer)))

}

#' Switch necessary column names
#'
#' ColSwitch takes old column names and changes them to current styling
#'
#' ColSwitch will change yr, mo, da, se, strata, grp, and sppn to Year, Month, Day, Seat, Strata, Obs_Type, and Species, respectively.
#' This function was written to quickly allow an old data file to be formatted for analysis, but probably won't be used in the future.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be checked.
#'
#' @return Data frame with new column names.
#'
#' @export
ColSwitch=function(data){

  colnames(data)[which(colnames(data)=="yr")]="Year"
  colnames(data)[which(colnames(data)=="mo")]="Month"
  colnames(data)[which(colnames(data)=="da")]="Day"
  colnames(data)[which(colnames(data)=="se")]="Seat"
  colnames(data)[which(colnames(data)=="strat")]="Strata"
  colnames(data)[which(colnames(data)=="grp")]="Obs_Type"
  colnames(data)[which(colnames(data)=="sppn")]="Species"

  return(data)
}

#' Check if column names are appropriate for analysis.
#'
#' ColMatch checks to see if the column headings match those needed by AKAerial for analysis.
#'
#' ColMatch is the first GreenLight QA/QC check performed on a data set.  If the column names do not match the necessary set of names in GreenLight,
#' the function will throw an error and end abruptly.  For the current list of acceptable names, see \code{\link{GreenLight}}
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be checked.
#'
#' @return TRUE/FALSE
#'
#' @export
ColMatch=function(data, necessary){

  return(all(necessary %in% colnames(data)))

}

#' Check if entries in a column are all numeric
#'
#' ShouldBeNumeric checks to see if column entries are numeric values needed for analysis.
#'
#' ShouldBeNumeric is a GreenLight QA/QC check performed on a data set.  If the column entries are not numeric values it will return a "red" status.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param col.data The column to be checked.
#'
#' @return data frame of TRUE/FALSE and offending entries
#'
#' @export
ShouldBeNumeric=function(col.data){

bad.na=suppressWarnings(which(is.na(col.data)))
bad=suppressWarnings(which(is.na(as.numeric(col.data))))
#bad=which(is.na(as.numeric(col.data)))

bad=bad[!(bad %in% bad.na)]

return(list("fail"=any(bad),"bad"=bad))

}

#' Check if Obs_Type entries are all correct
#'
#' UnitCheck checks to see if Obs_Type entries are correct values needed for analysis.
#'
#' UnitCheck is a GreenLight QA/QC check performed on a data set.  If the Obs_Type entries are not single, pair, flkdrake, or open, it will return a "red" status.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param col.data The column to be checked.
#'
#' @return data frame of TRUE/FALSE and offending entries
#'
#' @export
UnitCheck=function(data){

units=c("single", "pair", "flkdrake", "open", NA)

bad=data[which(!(data$Obs_Type %in% units)),]

return(list("fail"=(length(bad[,1])!=0), "bad"=bad))


}

#' Check if Seat entries are all correct
#'
#' SeatCheck checks to see if Seat entries are correct values needed for analysis.
#'
#' SeatCheck is a GreenLight QA/QC check performed on a data set.  If the Seat entries are not LF, RF, LR, or RR, it will return a "red" status.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param col.data The column to be checked.
#'
#' @return data frame of TRUE/FALSE and offending entries
#'
#' @export
SeatCheck=function(data){

  seats=c("LF", "RF", "LR", "RR")


  if("se" %in% colnames(data)){
  bad=data[which(!(data$se %in% seats)),]
  }

  if("Seat" %in% colnames(data)){
    bad=data[which(!(data$Seat %in% seats)),]
    }

  return(list("fail"=(length(bad[,1])!=0), "bad"=bad))


}


#' Check if Species column inputs are all correct
#'
#' SpeciesCheck checks to see if Species entries are correct values needed for analysis.
#'
#' SpeciesCheck is a GreenLight QA/QC check performed on a data set.  Species entries are cross-referenced with approved entries by project in
#' \code{\link{sppntable}} and values with known treatments are marked as "yellow light" issues that will be updated.  Completely unknown species codes
#' will result in a "red" status.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param col.data The column to be checked.
#'
#' @return data frame of TRUE/FALSE, incorrect unfixable entries, and incorrect but fixable entries
#'
#' @export
SpeciesCheck=function(data, area){


if(area!="WBPHS"){

  if("sppn" %in% colnames(data)){
    bad=data[which(!(data$sppn %in% sppntable$QAQC) & !(data$sppn %in% sppntable$INPUT)),]

    change=data[which(data$sppn %in% sppntable$INPUT & (!(data$sppn %in% sppntable$QAQC))),]

  }

  if("Species" %in% colnames(data)){


    bad=data[which(!(data$Species %in% sppntable$QAQC) & !(data$Species %in% sppntable$INPUT)),]

    change=data[which(data$Species %in% sppntable$INPUT & (!(data$Species %in% sppntable$QAQC))),]

  }

}

  if(area=="WBPHS"){


    if("sppn" %in% colnames(data)){
      bad=data[which(!(data$sppn %in% WBPHSsppntable$QAQC) & !(data$sppn %in% WBPHSsppntable$INPUT)),]

      change=data[which(data$sppn %in% WBPHSsppntable$INPUT & (!(data$sppn %in% WBPHSsppntable$QAQC))),]

    }

    if("Species" %in% colnames(data)){


      bad=data[which(!(data$Species %in% WBPHSsppntable$QAQC) & !(data$Species %in% WBPHSsppntable$INPUT)),]

      change=data[which(data$Species %in% WBPHSsppntable$INPUT & (!(data$Species %in% WBPHSsppntable$QAQC))),]

    }




  }

  return(list("fail"=(length(bad[,1])!=0), "bad"=bad, "change"=change))



}


#' Check if swan and swan nest entries are all correct
#'
#' SwanCheck checks to see if swan and swan nest entries were transcribed correctly for analysis.
#'
#' SwanCheck is a GreenLight QA/QC check performed on a data set.  The correct transcription of a swan and associated nest is to
#' record the swan Obs_Type (single or pair) and then the associated nest as a separate row with Obs_Type open and Num 1.  Anything
#' else will result in "yellow" status if the treatment of the data type is known, and "red" if unknown.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be checked.
#'
#' @return data frame of TRUE/FALSE and offending entries
#'
#' @export
SwanCheck=function(data){

  nest=data[data$Species=="TUNE" |
              data$Species=="TRNE" |
              data$Species=="SWANN" |
              data$Species=="SWNE" |
              data$Species=="SW.N" |
              data$Species=="TS.N" |
              data$Species == "TSNE" |
              data$Species == "tune" |
              data$Species == "TUSWN" |
              data$Species == "SWANNEST",]

  swan=data[data$Species=="TUSW" |
              data$Species=="TRSW" |
              data$Species=="SWAN" |
              data$Species=="TRUS" |
              data$Species=="Tusw" |
              data$Species=="TUsw" |
              data$Species=="tusw",]

  bad=nest[which(nest$Obs_Type != "open" | nest$Num != 1),]
  bad.index=as.numeric(rownames(bad))

  missing=nest[which(!(nest$Lat %in% swan$Lat & nest$Lon %in% swan$Lon)),]
  missing.index=as.numeric(rownames(missing))


  return(list("fail"=(length(bad[,1])!=0 | length(missing[,1])!=0), "bad"=bad, "missing"=missing, "bad.index"=bad.index, "missing.index"=missing.index))

}





#' Split an old combined data file by observer and year
#'
#' ObsByYear will take a .csv or .txt file with 2 or more observers and parse it into observer- and year-specific files
#'
#' ObsByYear will take a .csv or .txt file with 2 or more observers and parse it into observer- and year-specific files. This is designed to fix
#' older files that may have been concatenated into a master file, splitting them into separate pieces.  Since this was an ACP problem,
#' the files default to the directory "Q:/Waterfowl/Parsed/ACP_" and would need recoding for any other survey.  Individual .csv files are written to the directory.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param path.name The path to the combined file.
#'
#' @return None
#'
#' @export
ObsByYear=function(path.name){

  #created to split a master file out into observer/year specific files

  type <- file_ext(path.name)

  if(type == "txt") {data <- read.table(path.name, header=TRUE, stringsAsFactors = FALSE)}
  if(type == "csv") {data <- read.csv(path.name, header=TRUE, stringsAsFactors = FALSE)}

  yr.list=unique(data$yr)

  data$obs=toupper(data$obs)

  for(year in seq_along(yr.list)){

    temp.data=data[data$yr==yr.list[year],]
    obs.list=unique(temp.data$obs)

    print(obs.list)
    print(length(obs.list))

    for (i in 1:length(obs.list)){
    write.csv(temp.data[temp.data$obs==obs.list[i],],
              file=paste("Q:/Waterfowl/Parsed/ACP_", yr.list[year], "_RawObs_", obs.list[i], ".csv", sep=""),
              row.names=FALSE,
              col.names=names(temp.data),
              quote=FALSE
              )
    }

  }


}



#' Apply "yellow light" changes to a data set
#'
#' CommonFix is used in conjunction with GreenLight to apply "no-loss" changes to a data set
#'
#' Commonfix will take "yellow light" issues and apply a known treatment to fix offending issues. The list of fixes includes \itemize{
#'   \item Seat - changes lower to uppercase, flips the characters (FR, FL, RL to RF, LF, LR)
#'   \item Observer - changes lower to uppercase
#'   \item Swan - breaks up a nest-only observation into 2 observations
#'   \item Obs_Type - changes open 2 or open 1 SWAN to pair or single, changes SWANN to open 1
#'   \item Species - changes incorrect or outdated species codes to current ones}
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be fixed.
#' @param fix The character string vector of the names of the fixes to be applied.
#' @param area The project area designation.
#'
#' @return data frame of fixed columns
#'
#' @export
CommonFix=function(data, fix, area){

  #change seats to uppercase, fix swapping (FR to RF, etc)
  if("Seat" %in% fix){

    data$Seat=toupper(data$Seat)

    data$Seat[data$Seat=="FL"]="LF"
    data$Seat[data$Seat=="FR"]="RF"
    data$Seat[data$Seat=="RL"]="LR"

  }

  #change Observer to uppercase
  if("Observer" %in% fix){

    data$Observer=toupper(data$Observer)

  }


  if("Swan" %in% fix){

    swan.probs=SwanCheck(data)

    for(i in swan.probs$missing.index){

      new.row=data[i,]

      if(new.row$Num==1){new.row$Obs_Type="single"}
      if(new.row$Num==2){new.row$Obs_Type="pair"}
      new.row$Species="SWAN"
      new.row$Num=1

      data=rbind(data, new.row)

    }

    for(j in swan.probs$bad.index){
      data$Obs_Type[j]="open"

      if(data$Year > 1998 || area != "WBPHS"){
      data$Num[j]=1
        }

    }

  }

  if("Obs_Type" %in% fix){


    for(k in 1:length(data$Obs_Type)){

      if(data$Species[k] %in% c("TUNE", "TRNE", "SWANN", "SWNE", "SW.N", "TS.N", "TSNE", "tune", "TUSWN"))
      {
        data$Num[k]=1
        data$Obs_Type[k]="open"
        next
        }

      if(data$Num[k]==1){

        data$Obs_Type[k]="single"

      }


      if(data$Num[k]==2){

        data$Obs_Type[k]="pair"
        data$Num[k]=1

      }

      if(data$Num[k] > 2){

        data$Obs_Type[k]="open"

      }

    }


  }

  if("Species" %in% fix){

  #data$OldSpecies=data$Species

    if(area!="WBPHS"){

  for(n in 1:length(data$Species)){

    data$Species[n]=sppntable$QAQC[sppntable$INPUT==data$Species[n]]

  }
    }


    if(area == "WBPHS"){

      for(n in 1:length(data$Species)){

        data$Species[n]=WBPHSsppntable$QAQC[WBPHSsppntable$INPUT==data$Species[n]]

      }
    }


    }


  return(data)

}

#' Check if common fixes need to be applied to a raw data file
#'
#' Raw2Analysis checks to see if any of the common fixes need to be applied to a data set.
#'
#' Raw2Analysis is a QA/QC check in GreenLight that will check for and apply common fixes to a data set through CommonFix.
#'
#' @author Charles Frost, \email{charles_frost@@fws.gov}
#' @references \url{https://github.com/USFWS/AKaerial}
#'
#' @param data The data frame to be checked.
#' @param area The project area designation.
#'
#' @return list of data and fix categories that should be applied
#'
#' @export
Raw2Analysis=function(data, area){
  fix=c("Species")

  if(length(unique(data$Obs_Type[!is.na(data$Obs_Type)])) <= 1){fix=c(fix, "Obs_Type")}

  seat.test=SeatCheck(data)
  if(seat.test$fail==TRUE){fix=c(fix, "Seat")}

  obs.test=ObserverCheck(data)
  if(obs.test==FALSE){fix=c(fix, "Observer")}

  swan.test=SwanCheck(data)
  if(swan.test$fail==TRUE){fix=c(fix, "Swan")}

  data=CommonFix(data=data, fix=fix, area=area)

  return(list("newdata"=data, "fix"=fix))

}
USFWS/AKaerial documentation built on July 16, 2025, 10:07 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
USFWS/AKaerial
Analysis of Alaska Region Aerial Survey Data

R/greenlight.R
In USFWS/AKaerial: Analysis of Alaska Region Aerial Survey Data

Defines functions Raw2Analysis CommonFix ObsByYear SwanCheck SpeciesCheck SeatCheck UnitCheck ShouldBeNumeric ColMatch ColSwitch ObserverCheck SwitchMatch GreenLight

Documented in ColMatch ColSwitch CommonFix GreenLight ObsByYear ObserverCheck Raw2Analysis SeatCheck ShouldBeNumeric SpeciesCheck SwanCheck SwitchMatch UnitCheck

R Package Documentation

Browse R Packages

We want your feedback!

USFWS/AKaerial Analysis of Alaska Region Aerial Survey Data

R/greenlight.R In USFWS/AKaerial: Analysis of Alaska Region Aerial Survey Data

Defines functions Raw2Analysis CommonFix ObsByYear SwanCheck SpeciesCheck SeatCheck UnitCheck ShouldBeNumeric ColMatch ColSwitch ObserverCheck SwitchMatch GreenLight

Documented in ColMatch ColSwitch CommonFix GreenLight ObsByYear ObserverCheck Raw2Analysis SeatCheck ShouldBeNumeric SpeciesCheck SwanCheck SwitchMatch UnitCheck

R Package Documentation

Browse R Packages

We want your feedback!

USFWS/AKaerial
Analysis of Alaska Region Aerial Survey Data

R/greenlight.R
In USFWS/AKaerial: Analysis of Alaska Region Aerial Survey Data