R/airdas_chop_equallength.R

Defines functions airdas_chop_equallength.airdas_df airdas_chop_equallength.data.frame airdas_chop_equallength

Documented in airdas_chop_equallength airdas_chop_equallength.airdas_df airdas_chop_equallength.data.frame

#' Chop AirDAS data - equal length
#' 
#' Chop AirDAS data into equal-length effort segments, averaging conditions by segment
#' 
#' @param x \code{airdas_df} object, 
#'   or a data frame that can be coerced to a \code{airdas_df} object. 
#'   This data must be filtered for 'OnEffort' events; 
#'   see the Details section below
#' @param ... ignored
#' @param conditions see \code{\link{airdas_effort}}
#' @param seg.km numeric; target segment length in kilometers
#' @param randpicks.load character, data frame, or \code{NULL}. 
#'   If character, must be filename of past randpicks output to load and use 
#'   (passed to \code{file} argument of \code{\link[utils:read.table]{read.csv}}).
#'   If data frame, randpicks values will be extracted from the data frame.
#'   If \code{NULL}, new randpicks values will be generated by the function
#' @param distance.method character; see \code{\link{airdas_effort}}.
#'   Default is \code{NULL} since these distances should have already been
#'   calculated in \code{\link{airdas_effort}}
#' @param num.cores See \code{\link{airdas_effort}}
#' 
#' @details WARNING - do not call this function directly!
#'   It is exported for documentation purposes, but is intended for internal package use only.
#'
#'   This function is intended to only be called by \code{\link{airdas_effort}} 
#'   when the "equallength" method is specified. 
#'   Thus, \code{x} must be filtered for events (rows) where either
#'   the 'OnEffort' column is \code{TRUE} or the 'Event' column is either "E" or "O"; 
#'   see \code{\link{airdas_effort}} for more details. 
#'   This function chops each continuous effort section (henceforth 'effort sections') 
#'   in \code{x} into modeling segments (henceforth 'segments') of equal length. 
#'   Each effort section runs from a "T"/"R" event to its corresponding "E"/"O" event. 
#'   After chopping, \code{\link{airdas_segdata}} is called to get relevant  
#'   segdata information for each segment.
#' 
#'   When chopping the effort sections in segments of length \code{seg.km}, 
#'   there are several possible scenarios:
#'   \itemize{
#'     \item The extra length remaining after chopping is greater than or equal to 
#'       half of the target segment length (i.e. \code{>= 0.5*seg.km}): 
#'       the extra length is assigned to a random portion of the effort section as its own segment 
#'       (\href{https://github.com/smwoodman/swfscAirDAS/blob/master/inst/AirDAS_chop_equallength_figures.pdf}{see Fig. 1a})
#'     \item The extra length remaining after chopping is less than half of the 
#'       target segment length (i.e. \code{< 0.5*seg.km}): 
#'       the extra length is added to one of the (randomly selected) equal-length segments 
#'       (\href{https://github.com/smwoodman/swfscAirDAS/blob/master/inst/AirDAS_chop_equallength_figures.pdf}{see Fig. 1b})
#'     \item The length of the effort section is less than or equal to 
#'       the target segment length: the entire segment becomes a segment 
#'       (\href{https://github.com/smwoodman/swfscAirDAS/blob/master/inst/AirDAS_chop_equallength_figures.pdf}{see Fig. 1c})
#'     \item The length of the effort section is zero: a segment of length zero. 
#'       If there are more than two events (the "T"/R" and "E"/"O" events),
#'       the function throws a warning
#'   }
#'   
#'   Therefore, the length of each segment is constrained to be between 
#'   one half and one and one half of \code{seg.km} (i.e. \code{0.5*seg.km <=} 
#'   segment length \code{>=1.5*seg.km}), 
#'   and the central tendency is approximately equal to the target segment length. 
#'   The only exception is when a continuous effort section is less than 
#'   one half of the target segment length (i.e. \code{< 0.5*seg.km}; 
#'   \href{https://github.com/smwoodman/swfscAirDAS/blob/master/inst/AirDAS_chop_equallength_figures.pdf}{see Fig. 1c}).
#'   
#'   Note the PDF with Figs. 1a - 1c is included in the package, and can be found at:
#'   \code{system.file("AirDAS_chop_equallength_figures.pdf", package = "swfscAirDAS")}   
#'   
#'   'Randpicks' is a record of the random assignments that were made when 
#'   chopping the effort sections into segments, and can be saved to allow 
#'   users to recreate the same random allocation of extra km when chopping. 
#'   The randpicks returned by this function is a data frame with two columns: 
#'   the number of the effort section and the randpick value. 
#'   Users should save the randpicks output to a CSV file, 
#'   which then can be specified using the \code{randpicks.load} argument
#'   to recreate the same effort segments from \code{x} 
#'   (i.e., using the same AirDAS data) in the future.
#'   Note that when saving with \code{\link[utils:read.table]{write.csv}}, users must 
#'   specify \code{row.names = FALSE} so that the CSV file only has two columns.
#'   For an example randpicks file, see 
#'   \code{system.file("airdas_sample_randpicks.csv", package = "swfscAirDAS")}
#'
#'   If the column \code{dist_from_prev} does not exist, the distance between
#'   subsequent events is calculated as described in \code{\link{airdas_effort}}
#'      
#' @return List of three data frames:
#' \itemize{
#'   \item \code{x}, with columns added for the corresponding unique segment code and number
#'   \item segdata: data frame with one row for each segment, and columns with
#'     relevant data (see \code{\link{airdas_effort}} for specifics)
#'   \item randpicks: data frame with record of length allocations 
#'     (see Details section above)
#' }
#' 
#' @export
airdas_chop_equallength <- function(x, ...) UseMethod("airdas_chop_equallength")


#' @name airdas_chop_equallength
#' @export
airdas_chop_equallength.data.frame <- function(x, ...) {
  airdas_chop_equallength(as_airdas_df(x), ...)
}


#' @name airdas_chop_equallength
#' @export
airdas_chop_equallength.airdas_df <- function(
    x, conditions, seg.km, 
    randpicks.load = NULL, distance.method = NULL, num.cores = NULL, 
    ...) 
{
  #----------------------------------------------------------------------------
  # Input checks
  if (missing(seg.km))
    stop("You must specify a 'seg.km' argument when using the \"equallength\" ", 
         "method. See `?airdas_chop_equallength` for more details")
  
  if (!all(x$OnEffort | x$Event %in% c("O", "E"))) 
    stop("x must be filtered for on effort events; see `?airdas_chop_equallength")
  
  conditions <- .airdas_conditions_check(conditions)
  
  
  #----------------------------------------------------------------------------
  # Calculate distance between points if necessary
  #Check for distance.method happens in .dist_from_prev()
  if (!("dist_from_prev" %in% names(x))) {
    if (is.null(distance.method))
      stop("If the distance between consectutive points (events) ",
           "has not already been calculated, ",
           "then you must provide a valid argument for distance.method")
    
    x$dist_from_prev <- .dist_from_prev(x, distance.method)
  }
  
  
  #----------------------------------------------------------------------------
  # Load randpicks if applicable
  if (is.null(randpicks.load)) {
    r.pos <- NULL
    message("No argument was passed via randpicks.load, and thus new ", 
            "randpicks values will be generated")
    
  } else {
    randpicks.df <- if (inherits(randpicks.load, "data.frame")) {
      randpicks.load
    } else if (inherits(randpicks.load, "character")){
      read.csv(randpicks.load)
    } else {
      stop("randpicks.load must either be a data frame or ", 
           "file path (character)")
    }
    
    if (all(c("effort_section", "randpicks") %in% names(randpicks.df))) {
      r.eff.sect <- randpicks.df$effort_section
      r.pos <- randpicks.df$randpicks
      
    } else {
      warning("For the provided randpicks, it is assumed that ", 
              "the first column is the continuous effort section numbers, ", 
              "and the second column is the randpick values for that ", 
              "continuous effort section", 
              immediate. = TRUE)
      r.eff.sect <- randpicks.df[[1]]
      r.pos <- randpicks.df[[2]]
    }
  }
  
  
  #----------------------------------------------------------------------------
  # ID continuous effort sections, and if appl check against randpicks
  if (!("cont_eff_section" %in% names(x))) {
    x$cont_eff_section <- cumsum(x$Event %in% c("T", "R"))
  }
  
  eff.uniq <- unique(x$cont_eff_section)
  if (exists("r.eff.sect")) {
    if (length(eff.uniq) != length(r.eff.sect))
      stop("The provided AirDAS data (x) does not have the same number of ", 
           "continuous effort sections as the provided randpicks file has rows. ", 
           "Did you load the correct randpicks file, and does it have ", 
           "proper column names? See `?airdas_chop_equallength` for more details")
  }
  
  
  #----------------------------------------------------------------------------
  # Parallel thorugh each continuous effort section,
  #   getting segment lengths and segdata
  call.x <- x
  call.conditions <- conditions
  call.seg.km <- seg.km
  call.r.pos <- r.pos
  call.func1 <- airdas_segdata
  
  # Setup number of cores
  if(is.null(num.cores)) num.cores <- parallel::detectCores() - 1
  if(is.na(num.cores)) num.cores <- 1
  num.cores <- max(1, num.cores)
  num.cores <- min(parallel::detectCores() - 1, num.cores)
  
  
  cl <- swfscMisc::setupClusters(num.cores)
  eff.chop.list <- tryCatch({
    if(is.null(cl)) { # Don't parallelize if num.cores == 1
      lapply(
        eff.uniq, .chop_equallength_eff,
        call.x = call.x, call.conditions = call.conditions, 
        call.seg.km = call.seg.km, call.r.pos = call.r.pos,
        call.func1 = call.func1
      )
      
    } else { # Run lapply using parLapplyLB
      parallel::clusterExport(
        cl = cl,
        varlist = c("call.x", "call.conditions", "call.seg.km", "call.r.pos", 
                    "call.func1"),
        envir = environment()
      )
      parallel::parLapplyLB(
        cl, eff.uniq, .chop_equallength_eff,
        call.x = call.x, call.conditions = call.conditions, 
        call.seg.km = call.seg.km, call.r.pos = call.r.pos,
        call.func1 = call.func1
      )
    }
  }, finally = if(!is.null(cl)) parallel::stopCluster(cl) else NULL)
  
  
  #----------------------------------------------------------------------------
  # Extract information from eff.chop.list, and return
  
  ### Randpicks; including writing to csv if specified
  randpicks <- data.frame(
    effort_section = eff.uniq,
    randpicks = vapply(eff.chop.list, function(j) j[["pos"]], 1)
  )
  
  ### Segdata
  segdata <- data.frame(
    do.call(rbind, lapply(eff.chop.list, function(i) i[["das.df.segdata"]])), 
    stringsAsFactors = FALSE
  ) %>%
    mutate(segnum = seq_along(.data$file), 
           dist = round(.data$dist, 4)) %>%
    select(.data$segnum, .data$seg_idx, everything())
  
  ### Each das data point, along with segnum
  x.eff <- data.frame(
    do.call(rbind, lapply(eff.chop.list, function(i) i[["das.df"]])), 
    stringsAsFactors = FALSE
  ) %>% 
    left_join(segdata[, c("seg_idx", "segnum")], by = "seg_idx")
  
  ### Message about segments with length 0
  segs.message <- na.omit(vapply(eff.chop.list, function(i) i[["seg.0"]], 1))
  if (length(segs.message) > 0)
    warning("The following continuous effort section(s) had a length of zero ",
            "and events between the start and end points: ",
            paste(segs.message, collapse = ", "))
  
  
  #----------------------------------------------------------------------------
  # Return
  list(as_airdas_df(x.eff), segdata, randpicks)
}

Try the swfscAirDAS package in your browser

Any scripts or data that you put into this service are public.

swfscAirDAS documentation built on Aug. 9, 2023, 1:06 a.m.