knitr::opts_chunk$set(warnings = F, message = F)
This package is an extension and overhaul of the original cpcRain package by dlebauer. Many of the functions were written by Gopi Goteti in their first form and I have just updated and extended them. That package presented R code to download and analyze global precipitation data from the Climate Prediction Center (CPC)
Hydrological and climatological studies sometimes require rainfall data over the entire world for long periods of time. The Climate Prediction Center's (CPC) daily data, from 1979 to present, at a spatial resolution of 0.5 degrees lat-lon (~ 50 km at the equator) is a good resource. This data is available at CPC's ftp site (ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/GAUGE_GLB/). However, there are a number of issues:
Issues with obtaining data:
Issues with processing data:
This package is designed to facilitate two use cases that are likely to cover most users of the CPC data:
cpcQueryDateRange
function.cpcYearToNCDF
function downloads all CPC data (one year at a time) and stores the full year of data to a compressed NCDF (version 4) file. Next the cpcReadNCDF
function allows users to easily extract data from those files. Extracting data from multiple years at once is permitted, as long as the data files for all years are present.Further, this package supports users who wish to extract data in two formats:
1. 3D arrays. The CPC data is gridded on a 0.5 degree grid of longitude and latitude; the third dimension is time.
2. Tidy data, via the data.table
package. This format allows for easy manipulation of multiple variables, subsetting, grouping, and other fast operations on the data.
if(!require('devtools')) install.packages('devtools') devtools::install_github('jdossgollin/cpcRain', dependencies = T) library(cpcRain)
You can read this vignette at any time from your R session by calling:
vignette('Introduction', package = 'cpcRain')
For this example, we'll download data from the winter holidays of 1998.
Since we're just interested in a short date range, we don't need to download years of data, write .nc
files, or construct complicated databases.
We do this with the cpcQueryDateRange
function:
dt1 <- cpcQueryDateRange(start_date = ymd('1998-12-24'), end_date = ymd('1999-01-04'), tidy = T)
We'll look into the tidy
argument in a moment.
The cpcQueryDateRange
gives us a list of outputs.
The first tells us whether the dates we requested were successfully downloaded:
dt1$download_success
The second gives the rainfall data:
dt1$precip_data
Since we set tidy = TRUE
, we get a tidy data.table
.
If you're not familiar with this package and data format, you should read their introductions (on the CRAN package page, for example).
The data.table
syntax allows us to subset, group, and apply functions to the data easily.
We can also cast our data to create a matrix of dates versus grid cells using the data.table::dcast
function.
We can also set tidy = FALSE
to get an array of the data:
dt2 <- cpcQueryDateRange(start_date = ymd('1998-12-24'), end_date = ymd('1999-01-04'), tidy = F)
Be aware that every time you use the cpcQueryDateRange
function, data is re-downloaded from the server.
dt2$precip_data %>% dim()
As expected, our data has 720 values of longitude, 360 of latitude, and 12 of time.
The dimnames
gives us the longitude, latitude, and date parameters in an easy-to-parse format:
lapply(dimnames(dt2$precip_data), head)
If we want, we can use the cpcMeltArray
function to turn the 3D array of dt2$precip_data
into the data.table
of dt1$precip_data
:
dt3 <- dt2$precip_data %>% cpcMeltArray colMeans(dt3 == dt1$precip_data, na.rm = T)
This shows that we get the same results either way.
Now let's imagine that we're not content just querying data from a few weeks, but we want to look at all rainfall over the Northeast United States from 1997-2000.
Since this will be a large data set, we're going to use the cpcYearToNCDF
function to build a library of .nc
files, one for each year, to store our data.
That way, we can access it efficiently whenever we want.
download_years <- 1979:2016 success <- vector('list', length(download_years)) for(i in 1:length(download_years)){ success[[i]] <- cpcYearToNCDF( year = download_years[i], download_folder = '~/Documents/Data/CPC/', empty_raw = TRUE, overwrite = FALSE ) } success <- rbindlist(success)
The download_folder
parameter specifies where we want to save our .nc
files after we create them.
The empty_raw
is set to TRUE, so the program will automatically delete the raw files downloaded from the CPC server after the .nc
file is successfully created.
Finally, the overwrite
parameter is FALSE, so for a given year if the .nc
file already exists then the function won't download the data and write the data.
We can check the success of the data download:
success[order(success, date)]
If it shows up as "Not Attempted", it's because the .nc
file was already downloaded and overwrite
was set to FALSE.
The process of downloading the files and creating the .nc
files is slow, but once they're there, it's very easy to query them.
You don't even need to do it from R -- any software package that can read NCDF version 4 files can read them.
One way to read the files is with the excellent ncdf4
package like:
nc <- nc_open("~/Documents/Data/CPC/cpcRain_1997.nc")
You'll notice that the units on the time
variable are a bit odd:
nc$dim$time$vals[1:10] nc$dim$time$units
However, the advantage here is that the lubridate
package's as_date
function takes the origin to be 1970-01-01 by default, so you can call as_date(nc$dim$time$vals)
without any trouble.
Still, this is a needlessly complicated way to extract data from these files. You don't even have to call r nc_close(nc)
.
The much easier way is to use the cpcReadNCDF
function, which can extract data from multiple years at once.
Since the .nc
files are gridded by default, we can subset the data before reading it in:
dt4 <- cpcReadNCDF( start_date = ymd('1997-12-01'), end_date = ymd('1999-02-11'), lat_lims = c(35, 45), lon_lims = c(100, 110), download_folder = '~/Documents/Data/CPC/', tidy = TRUE, round_lonlat = TRUE ) print(dt4)
The start_date
, end_date
, lat_lims
, and lon_lims
parameters are pretty straightforward and do exactly what you would imagine.
The download_folder
needs to be where the .nc
files are stored -- this will be the same as the argument to the cpcYearToNCDF
function.
The tidy
argument, like before, causes the function to return a data.table
if TRUE and a 3D array if FALSE.
The array can, again, be melted with cpcMeltArray
.
Finally, the round_lonlat
argument is set to TRUE.
Under the hood, the function automatically rounded the lat_lims
and lon_lims
slightly to give results that fit within the data.
If round_lonlat
is FALSE, then entering values that aren't in the data set will throw an error.
This is a brand new version of this package, so there are likely to be bugs. Please use the issues tracker at https://github.com/jdossgollin/cpcRain/issues to report issues, bugs, improvements, bad documentation, etc. Thanks in advance for your help!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.