Setup

This example will use riri for data downloading and some pre-processing under the hood. It also uses the tidyverse package to facilitate post-processing and visualization of data.

library(riri)
library(tidyverse)

You can find the source code for the riri package on GitHub: https://github.com/ashiklom/riri. It is still in the early stages of development, so expect it to change frequently. I welcome comments and critiques in the issues, and, of course, pull requests!

Introduction to the IRI/LDEO library

The IRI/LDEO library contains a variety of data sources, which are accessible based on specific URLs. The library is also capable of doing its own preprocessing of data sources, such as subsetting to specific coordinates and specific times, averaging over specific durations, and more advanced analyses like correlations and empirical orthagonal functions (EOFs). These analyses are applied by appropriate tags added to the end of URLs.

The functions in riri are designed around this principle of progressively extending a string, and are therefore best understood via the magrittr pipe (%>%) operator.

base_url()
base_url() %>% ncep_var('air_temperature')
base_url() %>% ncep_var('precipitation')
base_url() %>% ncep_var('air_temperature')

These functions create a character vector, which is concatenated into a URL at the end by generate_url.

base_url() %>% ncep_var('air_temperature') %>% generate_url()

These examples provide the root URLs of each dataset. You can browse to those URLs to confirm that they work.

Although you can download a global dataset if you wish, most analyses will focus on a specific site or region. To filter a dataset down to a specific point, use the filter_point function.

mysite <- base_url() %>% 
    ncep_var('air_temperature') %>% 
    filter_point(lat = 45.67, lon = -85.553)
mysite
generate_url(mysite)

Note the resultant IRI-specific syntax.

Downloading data

To actually download a netCDF file containing the data, use the retrieve_data function.

mysite %>% 
    generate_url %>% 
    retrieve_data

By default, this downloads to a temporary file. Optionally, you can also provide a specific file path.

mysite %>% 
    generate_url %>% 
    retrieve_data('mysite.nc')

The output of retrieve_data is the filename itself.

mysite_file <- mysite %>% 
    generate_url %>% 
    retrieve_data()

mysite_nc <- ncdf4::nc_open(mysite_file)
mysite_nc

Preprocessing data

You may deal with the resultant netCDF files on your own if you wish. However, riri includes some functions to facilitate getting the data into a workable form first.

read_nc2list will read the contents a netCDF file into a list, with each item containing the full list of attributes. Here, we want to grab the variable and only the time dimension (T), since we are dealing with a time series at a single plot.

mysite_list <- read_nc2list(mysite_file, dims = 'T')
str(mysite_list)

A list format is convenient because, when the dimensions align, it can be easily made into a data.frame (or, in this case, a tibble).

as_data_frame(mysite_list)

Note that the time is in a weird format. You can extract the units from the mysite_list via the object's attributes.

attributes(mysite_list[['T']])
attr(mysite_list[['T']], 'units')

Months since 1960 is a weird date format. Fortunately, riri provides a conversion function that uses some lubridate magic to convert this to a date, and optionally drops the original T variable.

mysite_list2 <- process_date(mysite_list)
str(mysite_list2)

Putting it all together

All of these operations can be conveneintly wrapped into a single long pipeline, which in turn can be wrapped in a function for retrieving one kind of data. Let's create a function for retrieving NCEP monthly temperature, called ncep_temp.

ncep_temp <- function(lat, lon) {
    base_url() %>% 
        ncep_var('air_temperature') %>% 
        filter_point(lat = lat, lon = lon) %>% 
        generate_url() %>% 
        retrieve_data() %>% 
        read_nc2list(dims = 'T') %>% 
        process_date() %>% 
        as_data_frame()
}
site1_temp <- ncep_temp(43.63, -89.76)
site1_temp
site2_temp <- ncep_temp(71.27, -156.65)
site2_temp

We can use some tidyverse magic to efficiently extend this to a bunch of sites.

First, create a tibble containing site information.

coords_df <- tribble(
    ~site_id, ~lat, ~lon,
    1, 43.63330,  -89.75830,
    2, 71.27576, -156.64139,
    3, -2.85000,  -54.96667,
    4, 33.52174, -116.15635,
    5, 32.81471, -115.44200,
    6, 29.70000,  -82.16670,
    7, 42.49500,  -71.79810,
    8, 42.53100,  -72.19000,
    9, 45.22220,  -68.73560,
    10, 47.47000,   -0.56000
    )
coords_df

Then, use the purrr::map2 function to map each site's latitude and longitude to our new ncep_temp function.

temp_data <- coords_df %>% 
    mutate(temp_dat = map2(lat, lon, ncep_temp))
temp_data

Note that the temperature data is stored as a nested tibble. Read more about nested tibbles in the excellent book R for Data Science by Garrett Grolemund and Hadley Wickham.

Now, let's unnest this data and plot it.

temp_data_unnested <- unnest(temp_data)
ggplot(temp_data_unnested) + 
    aes(x = date, y = temp) +
    geom_line(color = 'grey60') + 
    geom_smooth() + 
    facet_wrap(~site_id, scales = 'free_y')

It may be more instructive to look at some summary statistics. To make this plot a little easier to look at, let's filter it down to just three of the sites.

summaries <- temp_data_unnested %>% 
    mutate(year = lubridate::year(date)) %>% 
    filter(year < 2017) %>% 
    group_by(year, site_id) %>% 
    summarize_at(vars(temp), funs(mean, min, max))
summaries
ggplot(summaries %>% 
       gather(stat, value, mean, min, max) %>% 
       filter(site_id %in% c(3, 6, 10))) + 
    aes(x = year, y = value) + 
    geom_line() + 
    geom_smooth() + 
    facet_wrap(~site_id + stat, scales = 'free', ncol = 3)


ashiklom/riri documentation built on May 5, 2019, 4:47 p.m.