R/format_weather.R

Defines functions .check_weather format_weather

Documented in format_weather

#' Format weather data into an an object suitable for use in ascotraceR spore dispersal models
#'
#' Formats raw weather data into an object suitable for use in the
#' [trace_asco()] function ensuring that the supplied weather data meet the
#' requirements of the model to run.
#'
#' @param x A [data.frame] object of weather station data for formatting.
#'   `character`.
#' @param YYYY Column name or index in `x` that refers to the year when the
#'   weather was logged. `character`.
#' @param MM Column name or index in `x` that refers to the month (numerical)
#'   when the weather was logged. `character`.
#' @param DD Column name or index in `x` that refers to the day of month when
#'   the weather was logged. `character`.
#' @param hh Column name or index in `x` that refers to the hour (24 hour) when
#'   the weather was logged. `character`.
#' @param mm Column name or index in `x` that refers to the minute when the
#'   weather was logged. `character`.
#' @param POSIXct_time Column name or index in `x` which contains a `POSIXct`
#'   formatted time. This can be used instead of arguments `YYYY`, `MM`, `DD`,
#'   `hh`, `mm.` `character`.
#' @param time_zone Time zone (Olsen time zone format) where the weather station
#'   is located. May be in a column or supplied as a character string. Optional,
#'   see also `r`. `character`. See details.
#' @param temp Column name or index in `x` that refers to temperature in degrees
#'   Celsius. `character`.
#' @param rain Column name or index in `x` that refers to rainfall in mm.
#'   `character`.
#' @param ws Column name or index in `x` that refers to wind speed in km / h.
#'   `character`.
#' @param wd Column name or index in `x` that refers to wind direction in
#'   degrees. `character`.
#' @param wd_sd Column name or index in `x` that refers to wind speed columns
#'   standard deviation .`character`.  This is only applicable if weather data
#'   is already summarised to hourly increments. See details.
#' @param station Column name or index in `x` that refers to the weather station
#'   name or identifier. `character`. See details.
#' @param lon Column name or index in `x` that refers to weather station's
#'   longitude. `character`. See details.
#' @param lat Column name or index in `x` that refers to weather station's
#'   latitude. `character`. See details.
#' @param r Spatial raster which is intended to be used with this weather data
#'   for use in the blackspot model. Used to set `time_zone` if it is not
#'   supplied in data. `character`. Optional, see also `time_zone`.
#' @param lonlat_file A file path to a \acronym{CSV} which included station
#'   name/id and longitude and latitude coordinates if they are not supplied in
#'   the data. `character`. Optional, see also `lon` and `lat`.
#'
#' @details `time_zone` All weather stations must fall within the same time
#'   zone.  If the required stations are located in differing time zones,
#'   separate `ascotraceR.weather` objects must be created for each time zone.
#'   If a raster object, `r`, of previous crops is provided that spans time
#'   zones, an error will be emitted.
#'
#' @details `wd_sd` If weather data is provided in hourly increments, a column
#'   with the standard deviation of the wind direction over the hour is required
#'   to be provided. If the weather data are sub-hourly, the standard deviation
#'   will be calculated and returned automatically.
#'
#' @details `lon`, `lat` and `lonlat_file` If `x` provides longitude and
#'   latitude values for station locations, these may be specified in the `lon`
#'   and `lat` columns.  If the coordinates are not relevant to the study
#'   location `NA` can be specified and the function will drop these column
#'   variables.  If these data are not included, (`NULL`) a separate file may be
#'   provided that contains the longitude, latitude and matching station name to
#'   provide station locations in the final `ascotraceR.weather` object that is
#'   created by specifying the file path to a \acronym{CSV} file using
#'   `lonlat_file`.
#'
#' @return A `ascotraceR.weather` object (an extension of \CRANpkg{data.table})
#'   containing the supplied weather aggregated to each hour in a suitable
#'   format for use with [trace_asco()] containing the following columns:
#'   \tabular{rl}{
#'   **times**: \tab Time in POSIXct format \cr
#'   **rain**: \tab Rainfall in mm \cr
#'   **ws**: \tab Wind speed in km / h \cr
#'   **wd**: \tab Wind direction in compass degrees \cr
#'   **wd_sd**: \tab Wind direction standard deviation in compass degrees \cr
#'   **lon**: \tab Station longitude in decimal degrees \cr
#'   **lat**: \tab Station latitude in decimal degrees \cr
#'   **station**: \tab Unique station identifying name \cr
#'   **YYYY**: \tab Year \cr
#'   **MM**: \tab Month \cr
#'   **DD**: \tab Day \cr
#'   **hh**: \tab Hour \cr
#'   **mm**: \tab Minute \cr}
#'
#' @examples
#' # Weather data files for Newmarracara for testing and examples have been
#' # included in ascotraceR. The weather data files both are of the same format,
#' # so they will be combined for formatting here.
#'
#' Newmarracarra <- read.csv(
#'  system.file("extdata",
#'              "1998_Newmarracarra_weather_table.csv",
#'              package = "ascotraceR")
#')
#'
#' station_data <- system.file("extdata",
#'                             "stat_dat.csv",
#'                             package = "ascotraceR")
#'
#' weather <- format_weather(
#'   x = Newmarracarra,
#'   POSIXct_time = "Local.Time",
#'   temp = "mean_daily_temp",
#'   rain = "rain_mm",
#'   ws = "ws",
#'   wd = "wd",
#'   wd_sd = "wd_sd",
#'   station = "Location",
#'   time_zone = "Australia/Perth",
#'   lonlat_file = station_data
#' )
#'
#' # Saving weather data and reimporting can lose the object class
#' # Reimported data can be quickly reformatted, adding the 'asco.weather' class
#' #  with this same function
#' temp_file_path <- paste0(tempdir(),"weather_file.csv")
#' write.csv(weather, file = temp_file_path, row.names = FALSE)
#' weather_imported <- read.csv(temp_file_path)
#' weather <- format_weather(weather_imported,
#'                           time_zone = "Australia/Perth")
#' unlink(temp_file_path) # remove temporary weather file
#' @export
format_weather <- function(x,
                           YYYY = NULL,
                           MM = NULL,
                           DD = NULL,
                           hh = NULL,
                           mm = NULL,
                           POSIXct_time = NULL,
                           time_zone = NULL,
                           temp,
                           rain,
                           ws,
                           wd,
                           wd_sd,
                           station,
                           lon = NULL,
                           lat = NULL,
                           r = NULL,
                           lonlat_file = NULL) {
  # CRAN Note avoidance
  times <- NULL #nocov

  # Check x class
  if (!is.data.frame(x)) {
    stop(call. = FALSE,
         "`x` must be provided as a `data.frame` object for formatting.")
  }

  # is this a pre-formatted data.frame that needs to be reformatted?
  if (all(
    c(
      "times",
      "temp",
      "rain",
      "ws",
      "wd",
      "wd_sd",
      "wet_hours",
      "station",
      "YYYY",
      "MM",
      "DD",
      "hh",
      "mm"
    ) %in% colnames(x)
  )) {
    # set as data.table
    x <- data.table(x)

    if (is.null(time_zone)) {
      stop(
        "Please provide the timezone of the source weather stations. If this
              was pre-formatted, use 'UTC'"
      )
    } else{
      x[, times := lubridate::ymd_hms(times, tz = time_zone)]
    }

    .check_weather(x)
    setattr(x, "class", union("asco.weather", class(x)))

    return(x)
  }

  # Check for missing inputs before proceeding
  if (is.null(POSIXct_time) &&
      is.null(YYYY) && is.null(MM) && is.null(DD) && is.null(hh)) {
    stop(
      call. = FALSE,
      "You must provide time values either as a `POSIXct_time` column or ",
      "values for `YYYY``, `MM`, `DD` and `hh`."
    )
  }

  if (is.null(lon) && is.null(lat) && is.null(lonlat_file)) {
    stop(
      call. = FALSE,
      "You must provide lonlat values for the weather station(s) either in ",
      "the `lon` & `lat` cols or as a file through `lonlat_file`."
    )
  }

  # Ensure only one object is provided for a time zone (raster or time_zone)
  if (!is.null(r) & !is.null(time_zone)) {
    stop(
      call. = FALSE,
      "Please only provide one way of determining the time zone.\n",
      "Either the time zone as a character string, in a column or ",
      "provide a raster of the area of interest for the time zone to be ",
      "automatically derived from."
    )
  }

  if (is.null(r) & is.null(time_zone)) {
    stop(
      call. = FALSE,
      "Please ensure that either a raster object for the area of interest, ",
      "`r`, or `time_zone` is provided to calculate the time zone for the ",
      "area of interest."
    )
  }

  if (is.null(hh) & is.null(POSIXct_time)) {
    stop(
      call. = FALSE,
      "Can't detect the hour time increment in supplied data (hh), Weather ",
      "data defining hour increments, must be supplied"
    )
  }


  # Assign a `time_zone` based on the raster centroid and check to ensure only
  # one time zone is provided
  if (is.null(time_zone)) {
    time_zone <-
      unique(
        lutz::tz_lookup_coords(
          lat = stats::median(as.vector(terra::ext(r))[3:4]),
          lon = stats::median(as.vector(terra::ext(r))[1:2]),
          method = "accurate"
        )
      )
  }
  if (length(time_zone) > 1) {
    stop(call. = FALSE,
         "Separate weather inputs for the model are required for",
         "each time zone.")
  }

  # convert to data.table and start renaming and reformatting -----------------
  x <- data.table(x)

  # check missing args
  # If some input are missing input defaults
  if (missing(mm)) {
    x[, mm := rep(0, .N)]
    mm <- "mm"
  }
  if (missing(wd_sd)) {
    x$wd_sd <- NA
    wd_sd <- "wd_sd"
  }
  if (missing(temp)) {
    x[, temp := rep(NA, .N)]
    temp <- "temp"
  }

  if (all(c(temp, rain, ws, wd, wd_sd, station) %in% colnames(x)) == FALSE) {
    stop(call. = FALSE,
         "Supplied column names are not found in column names of `x`.")
  }

  # import and assign longitude and latitude from a file if provided
  if (!is.null(lonlat_file)) {
    ll_file <- data.table(fread(lonlat_file))

    if (any(c("station", "lon", "lat") %notin% colnames(ll_file))) {
      stop(
        call. = FALSE,
        "The CSV file of weather station coordinates should contain ",
        "column names 'station','lat' and 'lon'."
      )
    }

    if (any(as.character(unique(x[, get(station)])) %notin%
            as.character(ll_file[, station]))) {
      stop(
        call. = FALSE,
        "The CSV file of weather station coordinates should contain ",
        "station coordinates for each weather station identifier."
      )
    }

    r_num <-
      which(as.character(ll_file[, station]) ==
              as.character(unique(x[, get(station)])))

    x[, lat := rep(ll_file[r_num, lat], .N)]
    x[, lon := rep(ll_file[r_num, lon], .N)]
  }

  # If lat and long are specified as NA
  if (!is.null(lat) & !is.null(lon)) {
    if (is.na(lat) & is.na(lon)) {
      x[, lat := rep(NA, .N)]
      x[, lon := rep(NA, .N)]
      lat <- "lat"
      lon <- "lon"
    }
  }

  # rename the columns if needed
  if (!is.null(YYYY)) {
    setnames(
      x,
      old = c(YYYY, MM, DD, hh, mm),
      new = c("YYYY", "MM", "DD", "hh", "mm"),
      skip_absent = TRUE
    )
  }


  setnames(x,
           old = temp,
           new = "temp",
           skip_absent = TRUE)

  setnames(x,
           old = rain,
           new = "rain",
           skip_absent = TRUE)

  setnames(x,
           old = ws,
           new = "ws",
           skip_absent = TRUE)

  setnames(x,
           old = wd,
           new = "wd",
           skip_absent = TRUE)

  setnames(x,
           old = wd_sd,
           new = "wd_sd",
           skip_absent = TRUE)

  setnames(x,
           old = station,
           new = "station",
           skip_absent = TRUE)

  if (!is.null(lat)) {
    setnames(x,
             old = lat,
             new = "lat",
             skip_absent = TRUE)
  }

  if (!is.null(lon)) {
    setnames(x,
             old = lon,
             new = "lon",
             skip_absent = TRUE)
  }

  if (!is.null(POSIXct_time)) {
    setnames(x,
             old = POSIXct_time,
             new = "times",
             skip_absent = TRUE)
    x[, times := as.POSIXct(times)]
    x[, YYYY := lubridate::year(x[, times])]
    x[, MM := lubridate::month(x[, times])]
    x[, DD := lubridate::day(x[, times])]
    x[, hh :=  lubridate::hour(x[, times])]
    x[, mm := lubridate::minute(x[, times])]

    # Add time_zone if there is no timezone for the station and coerce to
    # POSIXct class
    if (lubridate::tz(x[, times]) == "" ||
        lubridate::tz(x[, times]) == "UTC") {
      x[, times := lubridate::force_tz(x[, times],
                                       tzone = time_zone)]
    }
  } else {
    # if POSIX formatted times were not supplied, create a POSIXct
    # formatted column named 'times'

    x[, times := paste(YYYY, "-",
                       MM, "-",
                       DD, " ",
                       hh, ":",
                       mm, sep = "")][, times :=
                                        lubridate::ymd_hm(times,
                                                          tz = time_zone)]

  }

  if (any(is.na(x[, times]))) {
    stop(
      call. = FALSE,
      times,
      "Time records contain NA values or impossible time combinations,",
      "e.g., 11:60 am. Check time inputs"
    )
  }

  # workhorse of this function that does the reformatting
  .do_format <- function(x_dt,
                         YYYY = YYYY,
                         MM = MM,
                         DD = DD,
                         hh = hh,
                         mm = mm,
                         temp = temp,
                         rain = rain,
                         ws = ws,
                         wd = wd,
                         wd_sd = wd_sd,
                         station = station,
                         lon = lon,
                         lat = lat,
                         lonlat_file = lonlat_file,
                         times = times,
                         time_zone = time_zone) {
    # calculate the approximate logging frequency of the weather data

    log_freq <-
      lubridate::int_length(lubridate::int_diff(c(x_dt[1, times],
                                                  x_dt[.N, times]))) /
      (nrow(x_dt) * 60)

    # if the logging frequency is less than 50 minutes aggregate to hourly
    if (log_freq < 50) {
      w_dt_agg <- x_dt[, list(
        times = unique(lubridate::floor_date(times,
                                             unit = "hours")),
        temp = mean(temp, na.rm = TRUE),
        rain = sum(as.numeric(rain), na.rm = TRUE),
        ws = mean(ws, na.rm = TRUE),
        wd = as.numeric(
          circular::mean.circular(
            circular::circular(wd,
                               units = "degrees",
                               modulo = "2pi"),
            na.rm = TRUE
          ) # ** see line 310 below
        ),
        wd_sd = as.numeric(
          circular::sd.circular(
            circular::circular(wd,
                               units = "degrees",
                               modulo = "2pi"),
            na.rm = TRUE
          )
        ) * 57.29578,
        # this is equal to (180 / pi)
        # why multiply by (180 / pi) here but not on mean.circular above **
        lon = unique(lon),
        lat = unique(lat)
      ),
      by = list(YYYY, MM, DD, hh, station)]

      # insert a minute col that was removed during this aggregation
      w_dt_agg[, mm := rep(0, .N)]
      mm <- "mm"

      return(w_dt_agg)

    } else{
      if (all(is.na(x_dt[, wd_sd]))) {
        stop(
          call. = FALSE,
          "`format_weather()` was unable to detect or calculate `wd_sd`. ",
          "Please supply a standard deviation of wind direction."
        )
      }
      return(x_dt)
    }
  }

  if (length(unique(x[, "station"])) > 1) {
    # split data by weather station
    x <- split(x, by = "station")

    x_out <- lapply(
      X = x,
      FUN = .do_format,
      YYYY = YYYY,
      MM = MM,
      DD = DD,
      hh = hh,
      mm = mm,
      temp = temp,
      rain = rain,
      ws = ws,
      wd = wd,
      wd_sd = wd_sd,
      station = station,
      lon = lon,
      lat = lat,
      lonlat_file = lonlat_file,
      times = times,
      time_zone = time_zone
    )
    x_out <- rbindlist(x_out)
  } else {
    x_out <- .do_format(
      x_dt = x,
      YYYY = YYYY,
      MM = MM,
      DD = DD,
      hh = hh,
      mm = mm,
      temp = temp,
      rain = rain,
      ws = ws,
      wd = wd,
      wd_sd = wd_sd,
      station = station,
      lon = lon,
      lat = lat,
      lonlat_file = lonlat_file,
      times = times,
      time_zone = time_zone
    )
  }

  setcolorder(
    x_out,
    c(
      "times",
      "temp",
      "rain",
      "ws",
      "wd",
      "wd_sd",
      "lon",
      "lat",
      "station",
      "YYYY",
      "MM",
      "DD",
      "hh",
      "mm"
    )
  )

  # remove lat lon columns if they are NA. Presume that if lat is `NA` then the
  # values of lon have no real utility either so we only check if lat is `NA`
  if (all(is.na(unique(x_out[, lat])))) {
    x_out[, lat := NULL]
    x_out[, lon := NULL]
  }

  .check_weather(x_out)

  setattr(x_out, "class", union("asco.weather", class(x_out)))
  return(x_out[])
}

.check_weather <- function(final_w) {
  # note on cran avoidance (nocov) from data.table
  temp <- times <- rain <- ws <- wd <- NULL

  # Check temperatures
  # For NAs
  if (nrow(final_w[is.na(temp), ]) != 0)
    stop(
      call. = FALSE,
      "NA values in temperature; \n",
      paste(as.character(final_w[is.na(temp), times])),
      "\nplease use a complete dataset"
    )
  # for outside range
  if (nrow(final_w[temp < -30 |
                   temp > 60, ]) != 0)
    stop(
      call. = FALSE,
      "Temperature inputs are outside expected ranges (-30 and +60 degrees C); \n",
      paste(as.character(final_w[temp < -30 |
                                   temp > 60, times])),
      "\nplease correct these inputs and run again"
    )

  # Check rainfall
  # For NAs
  if (nrow(final_w[is.na(rain), ]) != 0)
    stop(
      call. = FALSE,
      "NA values in rainfall; \n",
      paste(as.character(final_w[is.na(rain), times])),
      "\nplease use a complete dataset"
    )
  # for outside range
  if (nrow(final_w[rain < 0 |
                   rain > 100, ]) != 0)
    stop(
      call. = FALSE,
      "rain inputs are outside expected ranges (0 and 100 mm); \n",
      paste(as.character(final_w[rain < 0 |
                                   rain > 100, times])),
      "\nplease correct these inputs and run again"
    )

  # Check windspeed
  # For NAs
  if (nrow(final_w[is.na(ws), ]) != 0)
    stop(
      call. = FALSE,
      "NA values in wind speed; \n",
      paste(as.character(final_w[is.na(ws), times])),
      "\nplease use a complete dataset"
    )
  # for outside range
  if (nrow(final_w[ws < 0 |
                   ws > 150, ]) != 0)
    stop(
      call. = FALSE,
      "wind speed inputs are outside expected ranges (0 and 150 kph); \n",
      paste(as.character(final_w[ws < 0 |
                                   ws > 150, times])),
      "\nplease correct these inputs and run again"
    )

  # Check Wind direction
  # For NAs
  if (nrow(final_w[is.na(wd), ]) != 0)
    stop(
      call. = FALSE,
      "NA values in wind direction; \n",
      paste(as.character(final_w[is.na(wd), times])),
      "\nplease use a complete dataset"
    )
  # for outside range
  if (nrow(final_w[wd < 0 |
                   wd > 360, ]) != 0)
    stop(
      call. = FALSE,
      "wind direction are outside expected ranges (0 and 360); \n",
      paste(as.character(final_w[wd < 0 |
                                   rain > 360, times])),
      "\nplease correct these inputs and run again"
    )
}

Try the ascotraceR package in your browser

Any scripts or data that you put into this service are public.

ascotraceR documentation built on Dec. 20, 2021, 5:07 p.m.