handle_gsod: List, download or convert to chillR format data from the...

handle_gsodR Documentation

List, download or convert to chillR format data from the Global Summary of the Day database

Description

This function can do four things related to the Global Summary of the Day ("GSOD") database from the National Climatic Data Centre (NCDC) of the National Oceanic and Atmospheric Administration (NOAA):

  • 1. It can list stations that are close to a specified position (geographic coordinates).

  • 2. It can retrieve weather data for a named weather station (or a vector of multiple stations). For the name, the chillRcode from the list returned by the list_stations operation should be used.

  • 3. It can 'clean' downloaded data (for one or multiple stations), so that they can easily be used in chillR

  • 4. It can delete the downloaded intermediate weather files from the machine

    Which of these functions is carried out depends on the action argument.

This function can run independently, but it is also called by the get_weather and weather2chillR functions, which some users might find a bit easier to handle.

Usage

handle_gsod(
  action,
  location = NULL,
  time_interval = c(1950, 2020),
  stations_to_choose_from = 25,
  end_at_present = FALSE,
  add.DATE = FALSE,
  update_station_list = FALSE,
  path = "climate_data",
  update_all = FALSE,
  clean_up = NULL,
  override_confirm_delete = FALSE,
  max_distance = 150,
  min_overlap = 0,
  verbose = "normal"
)

Arguments

action

accepts 4 types of inputs to decide on the mode of action for the function.

  • if this is the character string "list_stations", the function will return a list of the weather stations from the database that are closest to the geographic coordinates specified by location.

  • if this is the character string "download_weather", the function will attempt to download weather data from the database for the station named by the location argument, which should then be a character string corresponding to the chillRcode of the station (which you can get by running this function in 'list_stations' mode).

  • if this is the character string "delete", the function will attempt to remove the intermediate downloaded weather data, which was saved in the folder specified by "path" argument.

  • if this is a collection of outputs obtained by running this function in the 'download weather' mode), the function cleans the weather files and make them ready for use in chillR. If the input is just a dataframe (not a list, as produced with this function), you have to specify the database name with the database argument.

location

either a vector of geographic coordinates (for the 'list_stations' mode), or the 'chillRcode' of a weather station in the specified database (for the 'download_weather' mode). When running this function for data cleaning only, this is not needed. For the 'download_weather' mode, this can also be a vector of 'chillRcodes', in which case records for all stations will be downloaded. The data cleaning mode can also handle a list of downloaded weather datasets.

time_interval

numeric vector with two elements, specifying the start and end date of the period of interest. Only required when running in 'list_stations' or 'download_weather' mode. The default is c(1950,2020).

stations_to_choose_from

if the location is specified by geographic coordinates, this argument determines the number of nearby stations in the list that is returned.

end_at_present

boolean variable indicating whether the interval of interest should end on the present day, rather than extending until the end of the year specified under time_interval[2] (if time_interval[2] is the current year).

add.DATE

is a boolean parameter to be passed to make_all_day_table if action is a collection of outputs (in the form of list) from the function in the downloading format.

update_station_list

boolean, by default set FALSE. Decides if the weather station list is read from the disk (if present) or if it is newly downloaded in case of action = list_stations.

path

character, by default "climate_data". Specifies the folder, relative to the working directory where the weather data is downloaded to.

update_all

boolean, by default set to FALSE. If set TRUE, it will download every stations data, even if previously downloaded and still present in the temporary folder, specifief by the function argument path. If set FALSE, already downloaded years of a station will be skipped when download action is carried out again.

clean_up

character, by default set to NULL. In combination with 'action = delete', this can be set to 'all' to delete all weather data, or 'station' if only data from specific stations ('location') should be deleted

override_confirm_delete

Boolean, request whether the delete function needs user confirmation to run. Defaults to FALSE, and Should be set to TRUE if the function needs to be run without user intervention.

max_distance

numeric, by default 150. Expresses the distance in kilometers how far away weather stations can be located from the original location, when searching for weather stations

min_overlap

numeric, by default set to 0. Expresses in percent how much of the specified period needs to be covered by weather station to be included in the list, when searching for stations.

verbose

is a character, deciding how much information is returned while downloading the weather data. By default set to "normal". If set to "detailed" the function will say how many years of data have been successfully downloaded for each station. If set "quiet" no information is printed during download.

Details

The GSOD database is described here: https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00516

under the 'list_stations' mode, several formats are possible for specifying the location vector, which can consist of either two or three coordinates (it can include elevation). Possible formats include c(1, 2, 3), c(1, 2), c(x = 1, y = 2, z = 3), c(lat = 2, long = 1, elev = 3). If elements of the vector are not names, they are interpreted as c(Longitude, Latitude, Elevation).

The 'chillRCode' is generated by this function, when it is run with geographic coordinates as location inputs. In the list of nearby stations that is returned then, the chillRCode is provided and can then be used as input for running the function in 'downloading' mode. For downloading the data, use the same call as before but replace the location argument with the chillRCode.

Value

The output depends on the action argument. If it is 'list_stations', the function returns a list of station_to_choose_from weather stations that are close to the specified location. This list also contains information about how far away these stations are (in km), how much the elevation difference is (if elevation is specified; in m) and how much overlap there is between the data contained in the database and the time period specified by time_interval. If action is 'download_weather' the output is a list of the downloaded weather record, extended to the full duration of the specified time interval. If the location input was a vector of stations, the output will be a list of such objects. If action is a weather data.frame or a weather record downloaded with this function (in 'download_weather' mode), the data structure remains in the same, but the data are processed for easy use with chillR. If drop_most was set to TRUE, most columns are dropped. If the location input was a list of weather datasets, all elements of the list will be processed. **IMPORTANT NOTE:** as of chillR version 0.73, the output format no longer contains a list element that specifies the database name, because this has been considered confusing (and annoying) by various users. This means, however, that some earlier calls to results from the handle_gsod function may produce errors now. Also note that a few parameters, station_list, drop_most, quiet, add_station_name are no longer needed due to some reworking of the function's mechanisms. After careful consideration, we decided to drop these parameters entirely, which may lead to some downward compatibility problems. Apologies for any inconvenience caused by this transition. If you want to keep using the previous function (which is much slower), feel free to adopt the deprecated handle_gsod_old function - but note that this will no longer be updated and may disappear eventually.

Note

Many databases have data quality flags, which may sometimes indicate that data aren't reliable. These are not considered by this function!

For many places, the GSOD database is quite patchy, and the length of the record indicated in the summary file isn't always very useful (e.g. there could only be two records for the first and last date). Files are downloaded by year, so if we specify a long interval, this may take a bit of time.

Author(s)

Adrian Fülle, Lars Caspersen, Eike Luedeling

References

The chillR package:

Luedeling E, Kunz A and Blanke M, 2013. Identification of chilling and heat requirements of cherry trees - a statistical approach. International Journal of Biometeorology 57,679-689.

Examples


#coordinates of Bonn
long <- 7.0871843
lat <- 50.7341602

#get a list of close-by weather stations
# stationlist <-
#   handle_gsod(action = "list_stations",
#               time_interval = c(1995,2000),
#               location = c(long,lat))

#download data
# test_data <-
#   handle_gsod(action = "download_weather",
#               time_interval = c(1995,2000),
#               location = stationlist$chillR_code[c(1,2)])
# 
# format downloaded data
# test_data_clean <- handle_gsod(action = test_data)

## data deletion on disk for clean_up

# functions will ask for confirmation in the console - 'y' for yes to
# confirm deletion, anything else cancels the deletion

# handle_gsod(action = "delete",
#             clean_up = "all",
#             override_confirm_delete = TRUE)


chillR documentation built on Nov. 28, 2023, 1:09 a.m.