R/tempo_wrangle.R

Defines functions tempo_wrangle

Documented in tempo_wrangle

#' Format covariates for tempo
#'
#' Function for formatting a long form covariate data frame into a suitable
#' format stuiable for tempo.
#'
#' The function extracts specified covariates (\code{vars}) from a long format
#' data frame (\code{x}) for a dataset (\code{y}) for use in tempo. Returns
#' a list of covariate matrices. Rows are named using the \code{y$obs_id},
#' columns are named using the time steps in \code{x$time_step},
#' and list elements are named for the covariates using the \code{vars}
#' argument.
#'
#' @param x data.frame; Long form covariate data containing covariate info for
#' every observation/sample unit at every time step. Must contain a column,
#' \code{time_step}, containing an integer representation of the time
#' step (e.g. year, DOY, month, minute, etc.) for each covariate observation.
#' Must also contain a column \code{obs_id} to relate covariates correctly to
#' the observations in \code{y}.
#'
#' @param y data.frame; The observation data containing a column called
#' \code{obs_id} with unique observation IDs for each observation (row).
#'
#' @param vars vector; a character vector with the column names of the
#' covariates to be extracted and formatted from \code{x}
#'
#' @param n_time_steps The number of time steps you want to include in the
#' model. This argument is useful if you have covariate data for every day of
#' the year (365 time steps), but the event can only occur in the first
#' \emph{n} days of the year. In this case, you would specify
#' \code{n_time_steps} as \emph{n}. If \code{NULL}, Defaults to the total
#' number of time steps available in \code{x},
#' \code{length(unique(x$time_step))}.
#'
#' @importFrom dplyr one_of left_join select
#' @importFrom tidyr spread
#' @importFrom magrittr "%>%"
#' @rdname tempo_wrangle
#' @export
# TODO: allow option for joining field name != "obs_id"
tempo_wrangle <- function(x, y, vars, n_time_steps = NULL) {
  if (is.null(n_time_steps)) {
    n_time_steps <- length(unique(x$time_step))
  }
  covs <- x %>%
    select("obs_id", "time_step", vars)

  covariates <- list()

  for (i in seq_along(vars)) {
    leave_out <- vars[-i]
    all <- (covs %>%
              select(-one_of(leave_out)) %>%
              spread("time_step", vars[i]))

    covariates[[vars[i]]] <- as.matrix(
      left_join(y, all, "obs_id")[, (ncol(y) + 1):(n_time_steps + ncol(y))])
    dimnames(covariates[[vars[i]]]) <-
      list(y$obs_id,
           sort(unique(covs$time_step))[1:n_time_steps])
  }
  names(covariates) <- vars
  covariates
}
vlandau/tempo documentation built on March 18, 2020, 12:04 a.m.