library(tidyverse)
library(ecmwfr)
library(tidync)

Download a single file

This will download a single nc file. At the moment, it's hard-wired to pull down temperature at 2 meters. We'll look into flexing that.

The file extract function will process a single nc file. Again, the variable of interest is (semi) hard-wired. The function is fragile in that it presumes consistency between the data in the file and the date associated with it.

This function will reduce the geographic resolution of the data. Based on conversation with other researchers, this function will be modified to process a day with observations at every hour. We would then pick the maximum and minimum across 24 hours. We will also consider simply filtering out data points, rather than rounding the longitude and latitude.

Establish time range

The function below will create a data frame which houses a set of dates for extraction. Storing this in a data frame is overkill. Earlier I'd had some additional columns which I decided I didn't need. This structure will allow us to introduce some later if the mood strikes us.

All together now

First, create the tracker file if it doesn't exist. For now, this is ad hoc to fetch one or two years at a time.

tbl_tracker <- create_tracker_table(
  months = 1
  , years = 1982
)

Now, we open up the output file and filter on everything which hasn't been downloaded and extracted yet. Note that we're checking for the existence of the output file. This is pretty general and could support options to store all the data from a single year into a particular file. Simply change the name.

file_out <- file.path("data", "out.csv")

if (file.exists(file_out)) {
  tbl_written <- read_csv(file_out) %>% 
    select(date) %>% 
    unique()

  tbl_unwritten <- tbl_tracker %>% 
    anti_join(tbl_written, by = "date")

} else {
  tbl_unwritten <- tbl_tracker %>% 
    select(date)
}

We'll cheat slightly and use a for loop to churn through each day. This is fine, performance-wise as it's unlikely to be able to parallelize the remote calls. We've set a maximum file size. This isn't super necessary. It's just there as a circuit breaker in case something weird happens.

Because I'm putting this on GitHub, I'm storing my CDS user name in an environment variable. That sits in the .Renviron file which lives at user root.

max_file_out_size <- 200e6
for (i_date in seq_len(nrow(tbl_unwritten))) {

  if (file.exists(file_out) & file.size(file_out) > max_file_out_size) break

  a_file <- download_one_file(
    Sys.getenv('CDS_USER_NAME')
    , tbl_unwritten$date[i_date]
    , c('2m_temperature')
    # , c('2m_temperature', 'surface_pressure', 'total_precipitation')
  )

  tbl_extract <- a_file %>% 
    extract_one_file(tbl_unwritten$date[i_date])

  tbl_extract <- tbl_extract %>% 
    trim_grid() %>% 
    add_date_time() %>% 
    summarize_day()

  if (file.exists(file_out)) {
    tbl_extract %>%
      write_csv(path = file_out, append = TRUE, col_names = FALSE)
  } else {
    tbl_extract %>%
      write_csv(path = file_out, append = FALSE, col_names = TRUE)
  }

}

This is a code snippet which plotted temperature. Hanging on to it so that I don't need to think any more than I have to.

tbl_results <- read_csv(file_out)

tbl_results %>% 
  filter(date == min(date)) %>% 
  ggplot(aes(longitude, latitude)) + 
  geom_point(aes(color = mean)) + 
  scale_color_gradient(low = "blue", high = "red") + 
  theme_minimal()


ResearchActuary/climatter documentation built on Aug. 9, 2020, 2:33 p.m.