write_daily_timeseries: Write daily weather timeseries files for U.S. counties.

View source: R/daily_fips.R

write_daily_timeseriesR Documentation

Write daily weather timeseries files for U.S. counties.

Description

Given a vector of U.S. county FIPS codes, this function saves each element of the lists created from the function daily_fips to a separate folder within a given directory. This function therefore allows you to pull and save weather data time series for multiple counties at once. The dataframe daily_data is saved to a subdirectory of the given directory called "data." This timeseries dataframe gives the values for specified weather variables and the number of weather stations contributing to the average value for each day within the specified date range. The element station_metadata, which gives information about stations contributing to the time series, as well as statistical information about the values contributed by these stations, is saved in a subdirectory called "metadata." The element station_map, which is a map of contributing station locations, is saved in a subdirectory called "maps."

Usage

write_daily_timeseries(
  fips,
  coverage = NULL,
  date_min = NULL,
  date_max = NULL,
  var = "all",
  out_directory,
  data_type = "rds",
  metadata_type = "rds",
  average_data = TRUE,
  station_label = FALSE,
  keep_map = TRUE,
  verbose = TRUE
)

Arguments

fips

A string with the five-digit U.S. FIPS code of a county in numeric, character, or factor format.

coverage

A numeric value in the range of 0 to 1 that specifies the desired percentage coverage for the weather variable (i.e., what percent of each weather variable must be non-missing to include data from a monitor when calculating daily values averaged across monitors. The default is to include all monitors with any available data (i.e., coverage = 0).)

date_min

A string with the desired starting date in character, ISO format ("yyyy-mm-dd"). The dataframe returned will include only stations that have data for dates including and after the specified date. In other words, if you specify that this equals "1981-02-16", then it will return only the stations with at least some data recorded after Feb. 16, 1981. If a station stopped recording data before Feb. 16, 1981, it will be removed from the set of stations. If not specified, the function will include available stations, regardless of the date when the station started recording data.

date_max

A string with the desired ending date in character, ISO format ("yyyy-mm-dd"). The dataframe returned will include only stations that have data for dates up to and including the specified date. If not specified, the function will include available stations, regardless of the date when the station stopped recording data.

var

A character vector specifying desired weather variables. For example, var = c("tmin", "tmax", "prcp") for maximum temperature, minimum temperature, and precipitation. The default is "all", which includes all available weather variables at any weather station in the county. For a full list of all possible variable names, see NOAA's README file for the Daily Global Historical Climatology Network (GHCN-Daily) at http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt. Many of the weather variables are available for some, but not all, monitors, so your output from this function may not include all the variables specified using this argument. If you specify a variable here but it is not included in the output dataset, it means that it was not available in the time range for any monitor in the county.

out_directory

The absolute or relative pathname for the directory where you would like the three subdirectories ("data", "metadata", and "plots") to be created.

data_type

A character string indicating that you would like either .rds files (data_type = "rds") or .csv files (data_type = "csv") for the timeseries output. This option defaults to .rds files.

metadata_type

A character string indicating that you would like either .rds files (metadata_type = "rds") or .csv files (metadata_type = "csv") for the station metadata output. This option defaults to .rds files.

average_data

TRUE / FALSE to indicate if you want the function to average daily weather data across multiple monitors. If you choose FALSE, the function will return a dataframe with separate entries for each monitor, while TRUE (the default) outputs a single estimate for each day in the dataset, giving the average value of the weather metric across all available monitors in the county that day.

station_label

TRUE / FALSE to indicate whether to include station labels in the station map.

keep_map

TRUE / FALSE indicating if a map of the stations should be included. The map can substantially increase the size of the files, so if file size is a concern, you should consider setting this option to FALSE. If FALSE, the "maps" subdirectory will not be created.

verbose

TRUE / FALSE to indicate if you want the function to print the county or vector of counties it's saving files for as the function runs.

Value

Writes out three subdirectories of a given directory with daily weather files saved in "data", station metadata saved in "metadata", and a map of weather station locations saved in "maps" for each FIPS code specified provided there is available data for that county. The user can specify either .rds or .csv format for the data and metadata files, using the arguments data_type and metadata_type, respectively. Maps are saved as .png files.

Note

If the function is unable to pull weather data for a particular county given the specified percent coverage, date range, and/or weather variables, daily_timeseries will not produce files for that county.

Examples

## Not run: 
write_daily_timeseries(fips = c("37055", "15005"), coverage = 0.90,
                       date_min = "1995-01-01", date_max = "1995-01-31",
                       var = c("tmax", "tmin", "prcp"),
                       out_directory = "~/timeseries")

## End(Not run)

leighseverson/countyweather documentation built on April 9, 2022, 11:38 a.m.