R/handle_gsod.R

#' List, download or convert to chillR format data from the Global Summary of
#' the Day database
#' 
#' @description This function can do four things related to the Global Summary of the Day
#' ("GSOD") database from the National Climatic Data Centre (NCDC) of the
#' National Oceanic and Atmospheric Administration (NOAA): \itemize{
#' 
#' \item{1. It can list stations that are close to a specified position (geographic coordinates).}
#' 
#' \item{2. It can retrieve weather data for a named weather station (or a vector of multiple stations).
#' For the name, the chillRcode from the list returned by the \code{list_stations} operation
#' should be used.}
#' 
#' \item{3. It can 'clean' downloaded data (for one or multiple stations), so that they can easily be used in chillR}
#' 
#' \item{4. It can delete the downloaded intermediate weather files from the machine
#' 
#' Which of these functions is carried out depends on the \code{action} argument.}
#' 
#' }
#'  
#' This function can run independently, but it is also called by the
#' \code{\link{get_weather}} and \code{\link{weather2chillR}} functions, which some users might find a bit
#' easier to handle.
#' 
#' @details The GSOD database is described here:
#' \url{https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00516}
#' 
#' under the \code{'list_stations'} mode, several formats are possible for specifying
#' the location vector, which can consist of either two or three coordinates
#' (it can include elevation). Possible formats include \code{c(1, 2, 3)}, \code{c(1, 2)},
#' \code{c(x = 1, y = 2, z = 3)}, \code{c(lat = 2, long = 1, elev = 3)}. If elements of the vector are not
#' names, they are interpreted as c(Longitude, Latitude, Elevation).
#' 
#' The 'chillRCode' is generated by this function, when it is run with
#' geographic coordinates as location inputs. In the list of nearby stations
#' that is returned then, the chillRCode is provided and can then be used as
#' input for running the function in 'downloading' mode. For downloading the
#' data, use the same call as before but replace the location argument with the
#' chillRCode.
#' 
#' @param action accepts 4 types of inputs to decide on the mode of action for the function.\itemize{
#' 
#' \item{if this is the character string \code{"list_stations"}, the function
#' will return a list of the weather stations from the database that are
#' closest to the geographic coordinates specified by location.}
#' 
#' \item{if this is the character string \code{"download_weather"}, the function will attempt to download
#' weather data from the database for the station named by the location
#' argument, which should then be a character string corresponding to the
#' \code{chillRcode} of the station (which you can get by running this function in
#' \code{'list_stations'} mode).}
#' 
#' \item{if this is the character string \code{"delete"}, the function will attempt to remove
#' the intermediate downloaded weather data, which was saved in the folder specified by \code{"path"} argument.}
#' 
#' \item{if this is a collection of outputs obtained by
#' running this function in the \code{'download weather'} mode), the function cleans the
#' weather files and make them ready for use in \code{chillR}. If the input is just a dataframe
#' (not a list, as produced with this function), you have to specify the
#' database name with the database argument.}}
#' 
#' @param location either a vector of geographic coordinates (for the
#' \code{'list_stations'} mode), or the 'chillRcode' of a weather station in the
#' specified database (for the \code{'download_weather'} mode). When running this
#' function for data cleaning only, this is not needed. For the
#' \code{'download_weather'} mode, this can also be a vector of 'chillRcodes',
#' in which case records for all stations will be downloaded. The data cleaning
#' mode can also handle a list of downloaded weather datasets.
#'  
#' @param time_interval numeric vector with two elements, specifying the start
#' and end date of the period of interest. Only required when running in
#' \code{'list_stations'} or \code{'download_weather'} mode. The default is c(1950,2020).
#' 
#' @param stations_to_choose_from if the location is specified by geographic
#' coordinates, this argument determines the number of nearby stations in the
#' list that is returned.
#' 
#' @param end_at_present boolean variable indicating whether the interval of
#' interest should end on the present day, rather than extending until the end
#' of the year specified under \code{time_interval[2]} (if \code{time_interval[2]} is the
#' current year).
#' 
#' @param add.DATE is a boolean parameter to be passed to \code{\link{make_all_day_table}} if \code{action} is 
#' a collection of outputs (in the form of list) from the function in the downloading format.
#' 
#' @param update_station_list boolean, by default set FALSE. Decides if the weather station list is read from the disk (if present) or if it is newly downloaded in case of action = list_stations.
#' 
#' @param path character, by default "climate_data". Specifies the folder, relative to the working directory where the weather data is downloaded to.
#' 
#' @param update_all boolean, by default set to FALSE. If set TRUE, it will download every stations data, even if previously downloaded and 
#' still present in the temporary folder, specifief by the function argument \code{path}. If set FALSE, already downloaded years of a station will be skipped
#' when download action is carried out again.
#' 
#' @param clean_up character, by default set to NULL. In combination with 'action = delete', this can be set to 'all' to delete all weather data, or 'station' if only data from specific stations ('location') should be deleted
#' 
#' @param override_confirm_delete Boolean, request whether the delete function needs user confirmation to run. Defaults to \code{FALSE}, and
#' Should be set to \code{TRUE} if the function needs to be run without user intervention.
#' 
#' @param max_distance numeric, by default 150. Expresses the distance in kilometers how far away
#' weather stations can be located from the original location, when searching for weather stations
#' 
#' @param min_overlap numeric, by default set to 0. Expresses in percent how much of the specified period needs to be covered by weather station to be 
#' included in the list, when searching for stations.
#' 
#' @param verbose is a character, deciding how much information is returned while downloading
#' the weather data. By default set to "normal". If set to "detailed" the function
#' will say how many years of data have been successfully downloaded for each station. If set "quiet" no
#' information is printed during download.
#' 
#' 
#' @return The output depends on the action argument. If it is \code{'list_stations'},
#' the function returns a list of \code{station_to_choose_from} weather stations that
#' are close to the specified location. This list also contains information
#' about how far away these stations are (in km), how much the elevation
#' difference is (if elevation is specified; in m) and how much overlap there
#' is between the data contained in the database and the time period specified
#' by \code{time_interval}. If action is \code{'download_weather'} the output is a list of
#' the downloaded weather record, extended
#' to the full duration of the specified time interval. If the \code{location} input
#' was a vector of stations, the output will be a list of such objects.
#' If action is a weather \code{data.frame} or a weather record downloaded with
#' this function (in \code{'download_weather'} mode), the data structure remains
#' in the same, but the data are processed for easy use with \code{chillR}.
#' If drop_most was set to \code{TRUE}, most columns are dropped. If the
#' \code{location} input was a list of weather datasets, all elements of the
#' list will be processed.
#' **IMPORTANT NOTE:** as of \code{chillR} version 0.73, the output format no
#' longer contains a list element that specifies the database name, because this
#' has been considered confusing (and annoying) by various users. This means,
#' however, that some earlier calls to results from the \code{handle_gsod} function
#' may produce errors now. 
#' Also note that a few parameters, \code{station_list}, \code{drop_most},
#' \code{quiet}, \code{add_station_name} are no longer needed due to some
#' reworking of the function's mechanisms. After careful consideration, we
#' decided to drop these parameters entirely, which may lead to some downward
#' compatibility problems.
#' Apologies for any inconvenience caused by this transition. If you want to
#' keep using the previous function (which is much slower), feel free to adopt
#' the deprecated \code{handle_gsod_old} function - but note that this will no
#' longer be updated and may disappear eventually.
#'  
#' @note Many databases have data quality flags, which may sometimes indicate
#' that data aren't reliable. These are not considered by this function!
#' 
#' For many places, the GSOD database is quite patchy, and the length of the
#' record indicated in the summary file isn't always very useful (e.g. there
#' could only be two records for the first and last date). Files are downloaded
#' by year, so if we specify a long interval, this may take a bit of time.
#' 
#' @importFrom dplyr mutate filter arrange rowwise ungroup rename
#' @importFrom stringr str_replace str_split_i str_extract
#' @importFrom httr GET
#' @importFrom progress progress_bar
#' @importFrom utils head
#' @importFrom rlang .data
#' 
#' @author Adrian Fülle, Lars Caspersen, Eike Luedeling
#' @references The chillR package:
#' 
#' Luedeling E, Kunz A and Blanke M, 2013. Identification of chilling and heat
#' requirements of cherry trees - a statistical approach. International Journal
#' of Biometeorology 57,679-689.
#' @keywords utilities
#' 
#' @examples
#' 
#' #coordinates of Bonn
#' long <- 7.0871843
#' lat <- 50.7341602
#' 
#' #get a list of close-by weather stations
#' # stationlist <-
#' #   handle_gsod(action = "list_stations",
#' #               time_interval = c(1995,2000),
#' #               location = c(long,lat))
#' 
#' #download data
#' # test_data <-
#' #   handle_gsod(action = "download_weather",
#' #               time_interval = c(1995,2000),
#' #               location = stationlist$chillR_code[c(1,2)])
#' # 
#' # format downloaded data
#' # test_data_clean <- handle_gsod(action = test_data)
#' 
#' ## data deletion on disk for clean_up
#' 
#' # functions will ask for confirmation in the console - 'y' for yes to
#' # confirm deletion, anything else cancels the deletion
#' 
#' # handle_gsod(action = "delete",
#' #             clean_up = "all",
#' #             override_confirm_delete = TRUE)
#' 
#' @export handle_gsod
handle_gsod <- function (action, location = NULL, time_interval = c(1950, 2020), 
                         stations_to_choose_from = 25, end_at_present = FALSE, add.DATE = FALSE, 
                         update_station_list = FALSE, path = "climate_data", update_all = FALSE, 
                         clean_up = NULL, override_confirm_delete = FALSE, max_distance = 150, 
                         min_overlap = 0, verbose = "normal") {
  
  
  if(is.character(action)){
    if(length(action) != 1) stop('Action can only be a character of length one or a list of downloaded weather. Choose either "list_stations", "download_weather", or "delete".')
    if(action %in% c('download_weather', 'list_stations', 'delete') == FALSE) stop('You chose an action which is not covered by that funciton. Choose either "list_stations", "download_weather", or "delete". Or supply the list of downloaded weather for reformating and cleaning.')
  }
  
  
  
  overlap_desctools <- function(x, y) {
    x <- cbind(apply(rbind(x), 1L, min), apply(rbind(x), 
                                               1L, max))
    y <- cbind(apply(rbind(y), 1L, min), apply(rbind(y), 
                                               1L, max))
    maxdim <- max(nrow(x), nrow(y))
    x <- x[rep(1L:nrow(x), length.out = maxdim), , drop = FALSE]
    y <- y[rep(1L:nrow(y), length.out = maxdim), , drop = FALSE]
    d1 <- x[, 2L]
    idx <- x[, 2L] > y[, 2L]
    d1[idx] <- y[idx, 2L]
    d2 <- y[, 1L]
    idx <- x[, 1L] > y[, 1L]
    d2[idx] <- x[idx, 1L]
    d <- d1 - d2
    d[d <= 0L] <- 0L
    unname(d)
  }
  list_weather_stations <- function(time_interval = time_interval, 
                                    location, end_at_present = FALSE, path = "climate_data", 
                                    update_station_list = FALSE, verbose = "normal", max_distance = 150, 
                                    min_overlap = 0, stations_to_choose_from = 25) {
    
    #check if location is of right format
    if(length(location) != 2 | all(is.numeric(location)) == FALSE){
      stop('When generating the station list, the argument of "location" needs to be a numeric of length two, with first element being the longitude and second element being the latitude.')
    }
    if(length(time_interval) != 2 | all(is.numeric(time_interval)) == FALSE){
      stop('When generating the station list, the argument of "time_interval" needs to be a numeric of length two, with first element being the start year and second element being the end year.')
    }
    
    
    start_y <- time_interval[1]
    ifelse(length(time_interval) == 2, end_y <- time_interval[2], 
           end_y <- start_y)
    if (end_at_present == T) 
      end_y <- format(Sys.Date(), "%Y")
    years <- c(start_y:end_y)
    amount_of_years <- length(years)
    if (!dir.exists(path)) 
      dir.create(path)
    if (!file.exists(paste0(path, "/station_list.csv")) | 
        update_station_list == T) {
      httr::GET("https://www.ncei.noaa.gov/pub/data/noaa/isd-history.csv", 
                httr::write_disk(paste0(path, "/station_list.csv"), 
                                 overwrite = T)) %>% invisible()
      if (verbose == "detailed") 
        cat("Downloaded newest version of the stationlist.\n  'update_station_list' can be set to 'FALSE'.")
    }
    else if (verbose == "detailed") 
      cat("Stationlist was loaded from disk.\n  Set 'update_station_list' to 'TRUE' to update it (~3MB).")
    stat_list <- read.csv(paste0(path, "/station_list.csv")) %>% 
      rename(Long = contains("lo"), Lat = contains("la"), 
             Elev = contains("ele"))
    stat_list <- stat_list %>% 
      dplyr::mutate(chillR_code = paste(stat_list$USAF, 
                                        stat_list$WBAN, sep = "")) %>% 
      dplyr::filter(!is.na(.data$Lat) | !is.na(.data$Long)) %>%
      dplyr::filter(.data$Lat != 0, .data$Long != 0)
    
    lat_rad <- location[2]*pi/180
    lon_rad <- location[1]*pi/180
    lat_rad_stat <- stat_list$Lat*pi/180
    lon_rad_stat <- stat_list$Long*pi/180
    
    stat_list_filtered <- stat_list %>%
      dplyr::mutate(
        Distance = round(6378.388 * acos(sin(lat_rad) * sin(lat_rad_stat) +
                         cos(lat_rad) * cos(lat_rad_stat) *
                           cos(lon_rad_stat - lon_rad)), 
                         2)) %>%
      dplyr::arrange(.data$Distance) %>% 
      dplyr::filter(.data$Distance <= max_distance) %>% 
      select(c(12, 3, 4, 7, 8, 10, 11, 13))
    start_date <- as.Date(paste0(start_y, "0101"), format = "%Y%m%d")
    end_date <- as.Date(paste0(end_y, "1231"), format = "%Y%m%d")
    stat_list_filtered <- stat_list_filtered %>% dplyr::rowwise() %>% 
      dplyr::mutate(Overlap_years = round(((overlap_desctools(c(start_date, 
                                                                end_date), c(as.Date(as.character(.data$BEGIN), 
                                                                                     format = "%Y%m%d"), as.Date(as.character(.data$END), 
                                                                                                                 format = "%Y%m%d"))) + 1)/365.25), 2), Perc_interval_covered = round(((overlap_desctools(c(start_date, 
                                                                                                                                                                                                            end_date), c(as.Date(as.character(.data$BEGIN), 
                                                                                                                                                                                                                                 format = "%Y%m%d"), as.Date(as.character(.data$END), 
                                                                                                                                                                                                                                                             format = "%Y%m%d"))) + 1)/365.25/amount_of_years * 
                                                                                                                                                                                         100), 0)) %>% dplyr::ungroup() %>% dplyr::filter(.data$Perc_interval_covered >= 
                                                                                                                                                                                                                                            min_overlap) %>% head(stations_to_choose_from)
    return(stat_list_filtered)
  }
  download_weather_stations <- function(time_interval, location, 
                                        path = "climate_data", end_at_present = FALSE, update_all = FALSE, 
                                        verbose = "normal") {
    
    if(length(time_interval) != 2 | all(is.numeric(time_interval)) == FALSE){
      stop('When downloading the station data, the argument of "time_interval" needs to be a numeric of length two, with first element being the start year and second element being the end year.')
    }
    if(all(is.character(location)) == FALSE){
      stop('When downloading the station data, the argument of "location" needs to be a character with the chillR code of the weather station. You can access the chillR code after generating the station list.')
    }
    
    
    if (!dir.exists(path)) 
      dir.create(path)
    start_y <- time_interval[1]
    ifelse(length(time_interval) == 2, end_y <- time_interval[2], 
           end_y <- start_y)
    if (end_at_present == T) 
      end_y <- format(Sys.Date(), "%Y")
    years <- c(start_y:end_y)
    amount_of_years <- length(years)
    raw_link <- "https://www.ncei.noaa.gov/data/global-summary-of-the-day/access/"
    time_link <- rep(raw_link, times = amount_of_years) %>% 
      paste0(years, "/")
    location <- sapply(location, function(x) str_replace(x, 
                                                         "_", ""))
    subsets_list <- split(rep(time_link, each = length(location)), 
                          location)
    dataset_links <- list()
    for (i in 1:length(location)) {
      dataset_links[i] <- lapply(subsets_list[i], function(x) {
        paste0(x, names(subsets_list)[i], ".csv")
      })
      stat_list <- read.csv(paste0(path, "/station_list.csv")) %>% 
        dplyr::rename(Long = contains("lo"), Lat = contains("la"), 
                      Elev = contains("ele"))
      stat_list <- stat_list %>% dplyr::mutate(chillR_code = paste(stat_list$USAF, 
                                                                   stat_list$WBAN, sep = "")) %>% dplyr::filter(!is.na(.data$Lat) | 
                                                                                                                  !is.na(.data$Long))
      names(dataset_links)[i] <- dplyr::filter(stat_list, 
                                               .data$chillR_code == names(subsets_list)[i])$STATION.NAME
    }
    for (j in 1:length(location)) {
      if (!dir.exists(paste0(path, "/", location[j]))) 
        dir.create(paste0(path, "/", location[j]))
    }
    download <- function(dataset_links) {
      for (i in 1:length(location)) {
        if (i == 1) 
          if (verbose == "detailed" | verbose == "normal") 
            cat(paste0("Loading data for ", amount_of_years, 
                       " years from station '", names(dataset_links)[i], 
                       "'\n"))
        if (i != 1) 
          if (verbose == "normal") 
            cat(paste0("\nLoading data for ", amount_of_years, 
                       " years from station '", names(dataset_links)[i], 
                       "'\n"))
        if (i != 1) 
          if (verbose == "detailed") 
            cat(paste0("\n\nLoading data for ", amount_of_years, 
                       " years from station '", names(dataset_links)[i], 
                       "'\n"))
        if (verbose == "normal") 
          pb <- txtProgressBar(min = 0, max = length(dataset_links[[i]]), 
                               initial = 0)
        lapply(dataset_links[[i]], function(link) {
          if (verbose == "normal") 
            setTxtProgressBar(pb, which(dataset_links[[i]] == 
                                          link))
          station_id <- stringr::str_split_i(link, "/", 
                                             -1) %>% stringr::str_split_i(".csv", 1)
          year <- stringr::str_extract(link, "\\d+")
          filenames <- paste0(path, "/", station_id, 
                              "/", year, ".csv")
          if (update_all == F & file.exists(filenames)) 
            link <- ""
          if (!link == "") {
            if (!httr::http_error(link)) {
              httr::GET(link, httr::write_disk(filenames, 
                                               overwrite = T))
              if (verbose == "detailed") 
                cat(paste0("\nDataset of year ", year, 
                           " loading from GSOD database."))
            }
            else if (verbose == "detailed") 
              cat(paste0("\nDataset of year ", year, 
                         " for this station does not exist in the GSOD database."))
          }
          else if (verbose == "detailed") 
            cat(paste0("\nDataset of year ", year, " already present, loading from disk.(", 
                       filenames, ")"))
        })
        invisible()
        if (verbose == "normal") 
          close(pb)
      }
      if (verbose == "detailed") 
        cat("\n\nExplanations:")
      if (verbose == "detailed") 
        cat("\nLoaded in all specified data raw, if available.\n  Use the function on the dataset again (handle_gsod(dataset_name)) to dop columns and reformat to SI units (?C,liter/m?)")
      if (verbose == "detailed") 
        cat("\nIf datasets couldn't be found, they are missing in the GSOD database.\n  Nevertheless, doublecheck if the correct 'location' and 'time_interval' were choosen.")
      if (verbose == "detailed") 
        cat("\nIf station datasets shouldn't be loaded from disk set 'update_all' to 'TRUE'.\n  This updates/downloads all datasets from the GSOD database and takes a lot longer, if you already have the data partially on disk.")
    }
    download(dataset_links)
    multimerge <- function() {
      subfolders <- list.dirs(path = path, full.names = TRUE, 
                              recursive = FALSE)
      subfolders <- subfolders[stringr::str_extract(subfolders, 
                                                    "\\d+") %in% location]
      all_data <- list()
      all_data <- lapply(subfolders, function(x) {
        csv_files <- list.files(path = x, pattern = paste0(years, 
                                                           collapse = "|"), full.names = TRUE)
        if (length(csv_files) == 0) {
          csv_data <- data.frame(STATION = NA, DATE = NA, 
                                 LATITUDE = NA, LONGITUDE = NA, ELEVATION = NA, 
                                 NAME = NA, TEMP = NA, TEMP_ATTRIBUTES = NA, 
                                 DEWP = NA, DEWP_ATTRIBUTES = NA, SLP = NA, 
                                 SLP_ATTRIBUTES = NA, STP = NA, STP_ATTRIBUTES = NA, 
                                 VISIB = NA, VISIB_ATTRIBUTES = NA, WDSP = NA, 
                                 WDSP_ATTRIBUTES = NA, MXSPD = NA, GUST = NA, 
                                 MAX = NA, MAX_ATTRIBUTES = NA, MIN = NA, 
                                 MIN_ATTRIBUTES = NA, PRCP = NA, PRCP_ATTRIBUTES = NA, 
                                 SNDP = NA, FRSHTT = NA)
        }
        else {
          csv_data <- suppressWarnings(lapply(csv_files, 
                                              read.csv))
          for (i in 1:length(csv_data)) {
            if (ncol(csv_data[[i]]) == 1) {
              csv_data[[i]] <- data.frame(STATION = NA, 
                                          DATE = NA, LATITUDE = NA, LONGITUDE = NA, 
                                          ELEVATION = NA, NAME = NA, TEMP = NA, 
                                          TEMP_ATTRIBUTES = NA, DEWP = NA, DEWP_ATTRIBUTES = NA, 
                                          SLP = NA, SLP_ATTRIBUTES = NA, STP = NA, 
                                          STP_ATTRIBUTES = NA, VISIB = NA, VISIB_ATTRIBUTES = NA, 
                                          WDSP = NA, WDSP_ATTRIBUTES = NA, MXSPD = NA, 
                                          GUST = NA, MAX = NA, MAX_ATTRIBUTES = NA, 
                                          MIN = NA, MIN_ATTRIBUTES = NA, PRCP = NA, 
                                          PRCP_ATTRIBUTES = NA, SNDP = NA, FRSHTT = NA)
            }
          }
        }
        subfolder_df <- do.call(rbind, csv_data)
        return(subfolder_df)
      })
      return(all_data)
    }
    all_climate_data <- multimerge()
    names(all_climate_data) <- names(dataset_links)
    return(all_climate_data)
  }
  delete_weather_stations <- function(clean_up = NULL, override_confirm_delete = FALSE) {
    if (clean_up == "all") {
      cat(paste0("Are you sure you want to delete '", 
                 getwd(), "/", path, "'?", "\nThis path contains ", 
                 length(list.dirs(path)) - 1, " sub-folders, holding ", 
                 length(list.files("climate_data", recursive = T)) - 
                   1, " files in total.\n"))
      if (!override_confirm_delete) {
        user_input <- readline("Press 'y' to confirm deletion (y/n).")
        if (user_input != "y") 
          stop(call. = F, "Exiting since you did not press y.")
      }
      if (dir.exists(path)) {
        a <- unlink(path, recursive = T)
        unlink(path, recursive = T)
        if (a == 0) 
          cat(paste0("Removal of '", getwd(), "/", path, 
                     "' complete."))
      }
      else cat(paste0("\nPath '", getwd(), "/", path, 
                      "' does not exist."))
    }
    if (clean_up == "station") {
      if (!exists("location")) 
        cat("You need to specify the stations you want to delete in 'location'.")
      cat(paste0("Are you sure you want to delete the data for station(s) '", 
                 paste0(location, collapse = "', '"), "'?", "\nPath(s) contain(s) ", 
                 length(list.dirs(paste0(path, "/", location, 
                                         "/"))) - length(location), " sub-folders, holding ", 
                 length(list.files(paste0(path, "/", location, 
                                          "/"))), " files in total.\n"))
      if (!override_confirm_delete) {
        user_input <- readline("Press 'y' to confirm deletion (y/n).")
        if (user_input != "y") 
          stop(call. = F, "Exiting since you did not press y.")
      }
      dirs <- paste0(path, "/", location, "/")
      if (all(dir.exists(dirs))) {
        a <- unlink(dirs, recursive = T)
        unlink(dirs, recursive = T)
        if (a == 0) 
          cat(paste0("Removal of '", getwd(), "/", path, 
                     "' complete."))
      }
      else cat(paste0("\nPath(s)"), paste0("\n'./", dirs[dir.exists(dirs)], 
                                           "/'", collapse = ""), "\nnot existing.")
    }
  }
  clean_downloaded_weather <- function(action, add.DATE = TRUE) {
    clean_up <- function(data) {
      if (nrow(as.data.frame(data)[1]) == 28) {
        temp_data_complete <- data.frame(Date = NA, 
                                         Year = NA, Month = NA, Day = NA, Tmin = NA, 
                                         Tmax = NA, Tmean = NA, Prec = NA)
      }
      else {
        if ("weather" %in% names(data)) 
          old_format <- TRUE
        else old_format <- FALSE
        if (old_format) 
          dat <- data$weather
        else dat <- data
        temp_data_complete <- data.frame(Date = as.Date(dat$DATE), 
                                         Year = format(as.Date(dat$DATE), "%Y") %>% 
                                           as.numeric(), Month = format(as.Date(dat$DATE), 
                                                                        "%m") %>% as.numeric(), Day = format(as.Date(dat$DATE), 
                                                                                                             "%d") %>% as.numeric(), Tmin = ifelse(dat$MIN == 
                                                                                                                                                     9999.9, NA, round(((dat$MIN - 32) * 5/9), 
                                                                                                                                                                       3)), Tmax = ifelse(dat$MAX == 9999.9, NA, 
                                                                                                                                                                                          round(((dat$MAX - 32) * 5/9), 3)), Tmean = ifelse(dat$TEMP == 
                                                                                                                                                                                                                                              9999.9, NA, round(((dat$TEMP - 32) * 5/9), 
                                                                                                                                                                                                                                                                3)), Prec = ifelse(dat$PRCP == 99.99, NA, 
                                                                                                                                                                                                                                                                                   round((dat$PRCP * 25.4), 3)))
        if (old_format) 
          temp_data_complete <- list(database = "GSOD", 
                                     weather = temp_data_complete)
      }
      temp_data_complete
    }
    if (!inherits(action, "list")) {
      temp_data_incomplete <- clean_up(action)
    }
    else {
      temp_data_incomplete <- lapply(action, clean_up)
    }
    temp_data_complete <- lapply(temp_data_incomplete, function(tab) {
      if ("weather" %in% names(tab)) 
        old_format <- TRUE
      else old_format <- FALSE
      if (old_format) 
        tab <- tab$weather
      if (nrow(tab) == 1) {
        out <- data.frame(Date = NA, Year = NA, Month = NA, 
                          Day = NA, Tmin = NA, Tmax = NA, Tmean = NA, 
                          Prec = NA)
      }
      else {
        if (min(tab$Date) != as.Date(paste0(min(tab$Year), 
                                            "-01-01"))) {
          tab <- rbind.data.frame(data.frame(Date = as.Date(paste0(min(tab$Year), 
                                                                   "-01-01")), Year = min(tab$Year), Month = 1, 
                                             Day = 1, Tmin = NA, Tmax = NA, Tmean = NA, 
                                             Prec = NA), tab)
        }
        if (max(tab$Date) != as.Date(paste0(max(tab$Year), 
                                            "-12-31"))) {
          tab <- rbind.data.frame(tab, data.frame(Date = as.Date(paste0(max(tab$Year), 
                                                                        "-12-31")), Year = max(tab$Year), Month = 12, 
                                                  Day = 31, Tmin = NA, Tmax = NA, Tmean = NA, 
                                                  Prec = NA))
        }
        out <- chillR::make_all_day_table(tab, add.DATE = add.DATE)
      }
      if (old_format) 
        out <- list(database = "GSOD", weather = out)
      out
    })
    if (inherits(action, "list")) {
      names(temp_data_complete) <- names(action)
    }
    return(temp_data_complete)
  }
  if (is.character(action)) {
    if (action == "list_stations") {
      return(list_weather_stations(time_interval = time_interval, 
                                   location = location, end_at_present = end_at_present, 
                                   path = path, update_station_list = update_station_list, 
                                   verbose = verbose, max_distance = max_distance, 
                                   min_overlap = min_overlap, stations_to_choose_from = stations_to_choose_from))
    }
    if (action == "download_weather") {
      return(download_weather_stations(time_interval = time_interval, 
                                       location = location, path = path, end_at_present = end_at_present, 
                                       update_all = update_all, verbose = verbose))
      stoerr <- warnings()
    }
    if (action == "delete") {
      delete_weather_stations(clean_up = clean_up, override_confirm_delete = override_confirm_delete)
    }
  }
  if (is.list(action) | is.data.frame(action)) {
    if(is.data.frame(action)) action<- list(action)
    return(clean_downloaded_weather(action = action, add.DATE = add.DATE))
  }
}

Try the chillR package in your browser

Any scripts or data that you put into this service are public.

chillR documentation built on Nov. 28, 2023, 1:09 a.m.