R/ghcnd_search.R

Defines functions ghcnd_search

Documented in ghcnd_search

#' Get a cleaned version of GHCND data from a single weather site
#'
#' This function uses ftp to access the Global Historical Climatology Network
#' daily weather data from NOAA's FTP server for a single weather monitor site.
#' It requires the site identification number for that site and will pull the
#' entire weather dataset for the site. It will then clean this data to convert
#' it to a tidier format and will also, if requested, filter it to a certain
#' date range and to certain weather variables.
#'
#' @export
#' @inheritParams ghcnd
#' @param date_min A character string giving the earliest
#'    date of the daily weather time series that the user would
#'    like in the final output. This character string should be formatted as
#'    "yyyy-mm-dd". If not specified, the default is to keep all daily data for
#'    the queried weather site from the earliest available date.
#' @param date_max A character string giving the latest
#'    date of the daily weather time series that the user would
#'    like in the final output. This character string should be formatted as
#'    "yyyy-mm-dd". If not specified, the default is to keep all daily data for
#'    the queried weather site through the most current available date.
#' @param var A character vector specifying either `"all"` (pull all
#'    available weather parameters for the site) or the weather parameters to
#'    keep in the final data (e.g., `c("TMAX", "TMIN")` to only keep
#'    maximum and minimum temperature). Example choices for this argument
#'    include:
#'
#'    - `PRCP`: Precipitation, in tenths of millimeters
#'    - `TAVG`: Average temperature, in tenths of degrees Celsius
#'    - `TMAX`: Maximum temperature, in tenths of degrees Celsius
#'    - `TMIN`: Minimum temperature, in tenths of degrees Celsius
#'
#'    A full list of possible weather variables is available in NOAA's README
#'    file for the GHCND data
#'    (https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt).
#'    Most weather stations will only have a small subset of all the possible
#'    weather variables, so the data generated by this function may not include
#'    all of the variables the user specifies through this argument.
#'
#' @return A list object with slots for each of the available specified
#'    weather variables. Each element in the list is a separate time series
#'    dataframe with daily observations, as well as flag values, for one of
#'    the weather variables. The flag values give information on the quality
#'    and source of each observation; see the NOAA README file linked above
#'    for more information. Each data.frame is sorted by date, with the
#'    earliest date first.
#'
#' @author Scott Chamberlain \email{myrmecocystus@@gmail.com},
#' Adam Erickson \email{adam.erickson@@ubc.ca}
#'
#' @note This function calls [ghcnd()], which will download and save
#'    data from all available dates and weather variables for the queried
#'    weather station. The step of limiting the dataset to only certain dates
#'    and / or weather variables, using the `date_min`, `date_max`,
#'    and `var` arguments, does not occur until after the full data has
#'    been pulled.
#'
#' @details
#' Messages are printed to the console about file path, file last modified time
#' which you can suppress with `suppressMessages()`
#'
#' @seealso [meteo_pull_monitors()], [meteo_tidy_ghcnd()]
#'
#' @examples \dontrun{
#' # Search based on variable and/or date
#' ghcnd_search("AGE00147704", var = "PRCP")
#' ghcnd_search("AGE00147704", var = "PRCP", date_min = "1920-01-01")
#' ghcnd_search("AGE00147704", var = "PRCP", date_max = "1915-01-01")
#' ghcnd_search("AGE00147704", var = "PRCP", date_min = "1920-01-01",
#'              date_max = "1925-01-01")
#' ghcnd_search("AGE00147704", date_min = "1920-01-01", date_max = "1925-01-01")
#' ghcnd_search("AGE00147704", var = c("PRCP","TMIN"))
#' ghcnd_search("AGE00147704", var = c("PRCP","TMIN"), date_min = "1920-01-01")
#' ghcnd_search("AGE00147704", var = "adfdf")
#'
#' # refresh the cached file
#' ghcnd_search("AGE00147704", var = "PRCP", refresh = TRUE)
#' }
ghcnd_search <- function(stationid, date_min = NULL, date_max = NULL,
                         var = "all", refresh = FALSE, ...) {
  out <- ghcnd(stationid, refresh = refresh, ...)
  dat <- ghcnd_splitvars(out)

  # date check
  message("file min/max dates: ", min(dat[[1]]$date), " / ", max(dat[[1]]$date))

  possvars <- paste0(names(dat), collapse = ", ")
  if (any(var != "all")) {
    vars_null <- sort(tolower(var))[!sort(tolower(var)) %in% sort(names(dat))]
    dat <- dat[tolower(var)]
  }
  if (any(sapply(dat, is.null))) {
    dat <- noaa_compact(dat)
    warning(
      sprintf("%s not in the dataset\nAvailable variables: %s",
              paste0(vars_null, collapse = ", "), possvars), call. = FALSE)
  }
  if (!is.null(date_min)) {
    dat <- lapply(dat, function(z) dplyr::filter(z, date >= date_min))
  }
  if (!is.null(date_max)) {
    dat <- lapply(dat, function(z) dplyr::filter(z, date <= date_max))
  }
  # arrange by "day", not "month"
  dat <- lapply(dat, function(z) dplyr::arrange(z, date))
  return(dat)
}

Try the rnoaa package in your browser

Any scripts or data that you put into this service are public.

rnoaa documentation built on April 27, 2023, 9:08 a.m.