R/das_chop_equallength.R

Defines functions .chop_equallength_eff das_chop_equallength.das_df das_chop_equallength.data.frame das_chop_equallength

Documented in .chop_equallength_eff das_chop_equallength das_chop_equallength.das_df das_chop_equallength.data.frame

#' Chop DAS data - equal length
#'
#' Chop DAS data into approximately equal-length effort segments, averaging conditions by segment
#'
#' @param x an object of class \code{das_df},
#'   or a data frame that can be coerced to class \code{das_df}.
#'   This data must be filtered for 'continuous effort sections;
#'   see the Details section below
#' @param ... ignored
#' @param conditions see \code{\link{das_effort}}
#' @param seg.km numeric; target segment length in kilometers
#' @param randpicks.load character, data frame, or \code{NULL}.
#'   If character, must be filename of past randpicks output to load and use
#'   (passed to \code{file} argument of \code{\link[utils:read.table]{read.csv}}).
#'   If data frame, randpicks values will be extracted from the data frame.
#'   If \code{NULL}, new randpicks values will be generated by the function
#' @param distance.method character; see \code{\link{das_effort}}.
#'   Default is \code{NULL} since these distances should have already been calculated
#' @param num.cores see \code{\link{das_effort}}
#'
#' @details WARNING - do not call this function directly!
#'   It is exported for documentation purposes, but is intended for internal package use only.
#'
#'   This function is intended to be called by \code{\link{das_effort}}
#'   when the "equallength" method is specified.
#'   Thus, \code{x} must be filtered for events (rows) where either
#'   the 'OnEffort' column is \code{TRUE} or the 'Event' column is "E";
#'   see \code{\link{das_effort}} for more details.
#'   This function chops each continuous effort section (henceforth 'effort sections')
#'   in \code{x} into modeling segments (henceforth 'segments') of equal length.
#'   Each effort section runs from an "R" event to its corresponding "E" event.
#'   After chopping, \code{\link{das_segdata}} is called to get relevant
#'   segdata information for each segment.
#'
#'   When chopping the effort sections in segments of length \code{seg.km},
#'   there are several possible scenarios:
#'   \itemize{
#'     \item The extra length remaining after chopping is greater than or equal to
#'       half of the target segment length (i.e. \code{>= 0.5*seg.km}):
#'       the extra length is assigned to a random portion of the effort section as its own segment
#'       (\href{https://github.com/smwoodman/swfscDAS/blob/master/inst/DAS_chop_equallength_figures.pdf}{see Fig. 1a})
#'     \item The extra length remaining after chopping is less than half of the
#'       target segment length (i.e. \code{< 0.5*seg.km}):
#'       the extra length is added to one of the (randomly selected) equal-length segments
#'       (\href{https://github.com/smwoodman/swfscDAS/blob/master/inst/DAS_chop_equallength_figures.pdf}{see Fig. 1b})
#'     \item The length of the effort section is less than or equal to
#'       the target segment length: the entire segment becomes a segment
#'       (\href{https://github.com/smwoodman/swfscDAS/blob/master/inst/DAS_chop_equallength_figures.pdf}{see Fig. 1c})
#'     \item The length of the effort section is zero: a segment of length zero.
#'       If there are more than two events (the "B"/R" and "E" events),
#'       the function throws a warning
#'   }
#'
#'   Therefore, the length of each segment is constrained to be between
#'   one half and one and one half of \code{seg.km} (i.e. \code{0.5*seg.km <=}
#'   segment length \code{>=1.5*seg.km}),
#'   and the central tendency is approximately equal to the target segment length.
#'   The only exception is when a continuous effort section is less than
#'   one half of the target segment length (i.e. \code{< 0.5*seg.km};
#'   \href{https://github.com/smwoodman/swfscDAS/blob/master/inst/DAS_chop_equallength_figures.pdf}{see Fig. 1c}).
#'
#'   Note the PDF with Figs. 1a - 1c is included in the package, and can be found at:
#'   \code{system.file("DAS_chop_equal_figures.pdf", package = "swfscDAS")}
#'
#'   'Randpicks' is a record of the random assignments that were made when
#'   chopping the effort sections into segments, and can be saved to allow
#'   users to recreate the same random allocation of extra km when chopping.
#'   The randpicks returned by this function is a data frame with two columns:
#'   the number of the effort section and the randpick value.
#'   Users should save the randpicks output to a CSV file,
#'   which then can be specified using the \code{randpicks.load} argument
#'   to recreate the same effort segments from \code{x}
#'   (i.e., using the same DAS data) in the future.
#'   Note that when saving with \code{\link[utils:read.table]{write.csv}}, users must
#'   specify \code{row.names = FALSE} so that the CSV file only has two columns.
#'   For an example randpicks file, see
#'   \code{system.file("das_sample_randpicks.csv", package = "swfscDAS")}
#'
#'   If the column \code{dist_from_prev} does not exist, the distance between
#'   subsequent events is calculated as described in \code{\link{das_effort}}
#'
#' @return List of three data frames:
#' \itemize{
#'   \item \code{x}, with columns added for the corresponding unique segment code and number
#'   \item segdata: data frame with one row for each segment, and columns with
#'     relevant data (see \code{\link{das_effort}} for specifics)
#'   \item randpicks: data frame with record of length allocations
#'     (see Details section above)
#' }
#'
#' @export
das_chop_equallength <- function(x, ...) UseMethod("das_chop_equallength")


#' @name das_chop_equallength
#' @export
das_chop_equallength.data.frame <- function(x, ...) {
  das_chop_equallength(as_das_df(x), ...)
}


#' @name das_chop_equallength
#' @export
das_chop_equallength.das_df <- function(x, conditions, seg.km, randpicks.load = NULL,
                                        distance.method = NULL, num.cores = NULL, ...) {
  #----------------------------------------------------------------------------
  # Input checks
  conditions <- .das_conditions_check(conditions, "equallength")

  if (missing(seg.km))
    stop("You must specify a 'seg.km' argument when using the \"equallength\" ",
         "method. See `?das_chop_equallength` for more details")

  if (!all(x$OnEffort | x$Event %in% "E"))
    stop("x must be filtered for on effort events; see `?das_chop_equallength")


  #----------------------------------------------------------------------------
  # Add columns if necessary

  # Calculate distance between points; checks happen in .dist_from_prev()
  if (!("dist_from_prev" %in% names(x))) {
    if (is.null(distance.method))
      stop("If the distance between consectutive points (events) ",
           "has not already been calculated, ",
           "then you must provide a valid argument for distance.method")

    x$dist_from_prev <- .dist_from_prev(x, distance.method)
  }

  # Determine continuous effort sections
  if (!("cont_eff_section" %in% names(x))) {
    x$cont_eff_section <- cumsum(x$Event %in% c("R", "strataR"))
  }


  #----------------------------------------------------------------------------
  # Load randpicks if applicable
  if (is.null(randpicks.load)) {
    r.pos <- NULL
    message("No argument was passed via randpicks.load, and thus new ",
            "randpicks values will be generated")

  } else {
    randpicks.df <- if (inherits(randpicks.load, "data.frame")) {
      randpicks.load

    } else if (inherits(randpicks.load, "character")) {
      if (length(randpicks.load) != 1) {
        stop("randpicks.load must be a vector consisting of one filename")
      } else if (!all(file.exists(randpicks.load))) {
        stop("randpicks.load does not name an existing file, ",
             "aka file.exists(randpicks.load) is FALSE")
      }
      read.csv(randpicks.load)

    } else {
      stop("randpicks.load must either be a data frame or ",
           "character (file path)")
    }

    if (all(c("effort_section", "randpicks") %in% names(randpicks.df))) {
      r.eff.sect <- randpicks.df$effort_section
      r.pos <- randpicks.df$randpicks

    } else {
      message("It is assumed that the first column of ",
              "randpicks.load is the continuous effort section numbers, ",
              "and the second column is the corresponding randpick values")
      r.eff.sect <- randpicks.df[[1]]
      r.pos <- randpicks.df[[2]]
    }
  }


  #----------------------------------------------------------------------------
  # Check continuous effort sections against randpicks if applicable
  eff.uniq <- unique(x$cont_eff_section)
  stopifnot(length(eff.uniq) == sum(x$Event %in% c("R", "strataR")))
  if (exists("r.eff.sect")) {
    if (length(eff.uniq) != length(r.eff.sect)) {
      stop("The provided DAS data (x) does not have the same number of ",
           "continuous effort sections as the provided randpicks file has rows. ",
           "Did you load the correct randpicks file, and does it have ",
           "proper column names? See `?das_chop_equallength` for more details. ",
           "Alternatively, did you provide the right value for seg0.drop ",
           "in das_effort()?")
    }
  }


  #----------------------------------------------------------------------------
  # Parallel through each continuous effort section,
  #   getting segment lengths and segdata
  call.x <- x
  call.conditions <- conditions
  call.seg.km <- seg.km
  call.r.pos <- r.pos
  call.func1 <- das_segdata

  # Setup number of cores
  if(is.null(num.cores)) num.cores <- parallel::detectCores() - 1
  if(is.na(num.cores)) num.cores <- 1
  num.cores <- max(1, num.cores)
  num.cores <- min(parallel::detectCores() - 1, num.cores)


  cl <- swfscMisc::setupClusters(num.cores)
  eff.chop.list <- tryCatch({
    if(is.null(cl)) { # Don't parallelize if num.cores == 1
      lapply(
        eff.uniq, .chop_equallength_eff,
        call.x = call.x, call.conditions = call.conditions,
        call.seg.km = call.seg.km, call.r.pos = call.r.pos,
        call.func1 = call.func1
      )

    } else { # Run lapply using parLapplyLB
      parallel::clusterExport(
        cl = cl,
        varlist = c("call.x", "call.conditions", "call.seg.km", "call.r.pos",
                    "call.func1"),
        envir = environment()
      )
      parallel::parLapplyLB(
        cl, eff.uniq, .chop_equallength_eff,
        call.x = call.x, call.conditions = call.conditions,
        call.seg.km = call.seg.km, call.r.pos = call.r.pos,
        call.func1 = call.func1
      )
    }
  }, finally = if(!is.null(cl)) parallel::stopCluster(cl) else NULL)


  #----------------------------------------------------------------------------
  # Extract information from eff.chop.list, and return

  ### Randpicks
  randpicks <- data.frame(
    effort_section = eff.uniq,
    randpicks = vapply(eff.chop.list, function(j) j[["pos"]], 1)
  )

  ### Segdata
  segdata <- data.frame(
    do.call(rbind, lapply(eff.chop.list, function(i) i[["das.df.segdata"]])),
    stringsAsFactors = FALSE
  ) %>%
    mutate(segnum = seq_along(.data$file),
           dist = round(.data$dist, 4)) %>%
    select(.data$segnum, everything())

  ### Each das data point, along with segnum
  x.eff <- data.frame(
    do.call(rbind, lapply(eff.chop.list, function(i) i[["das.df"]])),
    stringsAsFactors = FALSE
  ) %>%
    left_join(segdata[, c("seg_idx", "segnum")], by = "seg_idx")

  ### Message about segments with length 0
  ###   Must be outside b/c no messages come out of parallel
  segs.message <- na.omit(vapply(eff.chop.list, function(i) i[["seg.0"]], 1))
  if (length(segs.message) > 0)
    warning("The following continuous effort section(s) had a length of zero ",
            "and events between the start and end points: ",
            paste(segs.message, collapse = ", "))


  ### Message about segments with length 0 and time diff > 10s
  ###   Must be outside b/c no messages come out of parallel
  segs.message.10s <- vapply(eff.chop.list, function(i) {
    c1 <- isTRUE(.equal(i[["das.df.segdata"]][["dist"]], 0))
    i.das.dt <- i[["das.df"]][["DateTime"]]
    c2 <- abs(difftime(i.das.dt[1], tail(i.das.dt, 1), units = "sec")) > 10
    c1 && c2
  }, as.logical(1))

  if (sum(segs.message.10s) > 0)
    warning("The following continuous effort section(s) had a length of zero ",
            "and events between the start and end points. ",
            ifelse(sum(segs.message.10s) > 1,
                   "It is strongly recommended that you review these effort sections in the DAS file: ",
                   "It is strongly recommended that you review this effort section in the DAS file: "),
            paste(which(segs.message.10s), collapse = ", "))



  #----------------------------------------------------------------------------
  # Return
  list(as_das_df(x.eff), segdata, randpicks)
}



#' @name swfscAirDAS-internals
#' @param i ignore
#' @param call.x ignore
#' @param call.conditions ignore
#' @param call.seg.km ignore
#' @param call.r.pos ignore
#' @param call.func1 ignore
#' @export
.chop_equallength_eff <- function(i, call.x, call.conditions, call.seg.km,
                                  call.r.pos, call.func1) {
  ### Inputs
  # i: Index of current continuous effort section
  # call.x: x argument from das_chop_equallength(), with a few additional columns
  # call.conditions: conditions argument from das_chop_equallength()
  # call.seg.km: seg.km argument from das_chop_equallength()
  # call.r.pos: randpicks value; if NULL, a new value is generated
  # call.func1: _segdata_ function - needs to be passed in since
  #   this function is used by swfscAirDAS as well

  ### Output
  # List with, for this continuous effort section:
  #   1) DAS data frame, 2) segment lengths, 3) randpicks value, and 4) segdata

  #------------------------------------------------------
  ### Get lengths of effort segments
  # Prep
  das.df <- filter(call.x, .data$cont_eff_section == i)
  pos <- call.r.pos[i]

  das.df$dist_from_prev[1] <- 0 #Ignore distance from last effort

  seg.dist <- sum(das.df$dist_from_prev)
  seg.dist.mod <- seg.dist %% call.seg.km

  seg.0 <- NA
  seg.0.10s <- NA

  # Determine segment lengths
  if (.equal(seg.dist, 0)) {
    # If current segment length is 0 and there are other events, throw warning
    if (nrow(das.df) > 2) seg.0 <- i

    seg.lengths <- 0
    pos <- NA

    # EAB makes a 0.1km segment if it includes a sighting
    # if (nrow(das.df) > 2) {
    #   seg.0 <- i
    #   if ("S" %in% das.df$Event)
    #
    # } else {
    #   seg.lengths <- 0
    #   pos <- NA
    # }

  } else {
    if (.less_equal(seg.dist, call.seg.km)) {
      # If current segment length is less than target length,
      #   only make one segment
      n.subseg <- 1
      if (is.null(pos)) pos <- NA
      seg.lengths <- seg.dist

    } else if (.greater_equal(seg.dist.mod, (call.seg.km / 2))) {
      # If current segment length is greater than the target length and
      #   remainder is greater than or equal to half of the target length,
      #   the remainder is its own (randomly placed) segment
      n.subseg <- ceiling(seg.dist/call.seg.km)
      if (is.null(pos)) pos <- ceiling(runif(1, 0, 1) * n.subseg)
      if (is.na(pos) | !between(pos, 1, n.subseg))
        stop("Randpicks value is not in proper range")
      seg.lengths <- rep(call.seg.km, n.subseg)
      seg.lengths[pos] <- seg.dist.mod

    } else if (.less(seg.dist.mod, (call.seg.km / 2))) {
      # If current segment length is greater than the target length and
      #   remainder is less than half of the target length,
      #   the remainder added to a random segment
      n.subseg <- floor(seg.dist/call.seg.km)
      if (is.null(pos)) pos <- ceiling(runif(1, 0, 1) * n.subseg)
      if (is.na(pos) | !between(pos, 1, n.subseg))
        stop("Randpicks value is not in proper range")
      seg.lengths <- rep(call.seg.km, n.subseg)
      seg.lengths[pos] <- call.seg.km + seg.dist.mod

    } else {
      stop("Error in das_chop_equallength() - unrecognized effort situation. ",
           "Please report this as an issue")
    }
  }

  #------------------------------------------------------
  ### Assign each event to a segment
  subseg.cumsum <- cumsum(seg.lengths)
  das.cumsum <- cumsum(das.df$dist_from_prev)

  effort_subseg <- findInterval(
    round(das.cumsum, 4), round(c(-1, subseg.cumsum), 4),
    left.open = TRUE, rightmost.closed = TRUE
  )
  das.df$seg_idx <- paste0(i, "_", effort_subseg)


  #------------------------------------------------------
  ### Get segdata and return
  # das.df.segdata <- das_segdata(as_das_df(das.df), seg.lengths, i)
  das.df.segdata <- call.func1(
    x = das.df, conditions = call.conditions, segdata.method = "avg",
    seg.lengths = seg.lengths, section.id = i
  )

  list(
    das.df = das.df, seg.lengths = seg.lengths, pos = pos,
    das.df.segdata = das.df.segdata,
    seg.0 = seg.0
  )
}

Try the swfscDAS package in your browser

Any scripts or data that you put into this service are public.

swfscDAS documentation built on Aug. 10, 2023, 9:06 a.m.