# do not execute on CRAN: 
# https://stackoverflow.com/questions/28961431/computationally-heavy-r-vignettes
is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
             "_R_CHECK_LICENSE_") %in% names(Sys.getenv()))
#knitr::opts_chunk$set(eval = !is_check)
#rmarkdown::render("vignettes/useCase.Rmd")
#knitr::opts_knit$set(root.dir = '..')
knitr::opts_chunk$set(
    #, fig.align = "center"
    #, fig.width = 3.27, fig.height = 2.5, dev.args = list(pointsize = 10)
    #,cache = TRUE
    #, fig.width = 4.3, fig.height = 3.2, dev.args = list(pointsize = 10)
    #, fig.width = 6.3, fig.height = 6.2, dev.args = list(pointsize = 10)
    # works with html but causes problems with latex
    #,out.extra = 'style = "display:block; margin: auto"' 
    )
knitr::knit_hooks$set(spar = function(before, options, envir) {
    if (before) {
        par(las = 1 )                   #also y axis labels horizontal
        par(mar = c(2.0,3.3,0,0) + 0.3 )  #margins
        par(tck = 0.02 )                          #axe-tick length inside plots             
        par(mgp = c(1.1,0.2,0) )  #positioning of axis title, axis labels, axis
     }
})
#themeTw <- theme_bw(base_size = 10) + theme(axis.title = element_text(size = 9))
#bgiDir <- "~/bgi"

NetCDF in R Tutorial

from Dept. Geography, Univ. Oregon

Depends on variabls generated by read.Rmd

load.image("netCDF01.RData")

Conversions

NetCDF files or data sets are naturally raster slabs (e.g. a longitude by latitude “slice”), bricks (e.g. a longitude by latitude by time), or 4-d arrays (e.g. a longitude by latitude by height by time), while most data analysis routines in R expect 2-d variable-by-observation data frames. In addition, time is usually stored as the CF (Climate Forecast) “time since” format that is not usually human-readable. Here are some example conversions.

Load some necessary packages

library(chron)
library(lattice)
library(RColorBrewer)

Convert the time variable

The time variable, in “time-since” units can be converted into “real” (or more easily readable) time values by splitting the time tunits$value string into its component parts, and then using the chron() function to determine the absolute value of each time value from the time origin.

TODO

Replace netCDF fillvalues with R NAs

In NetCDF file, values of a variable that are either missing or simply not available (i.e. ocean grid points in a terrestrial data set) are flagged using specific “fill values” (_FillValue) or missing values (missing_value), the values of which are set as attributes of a variable. In R, such unavailable data are indicated using the “NA” value. The following code fragment illustrates how to replace the netCDF variable’s fill values with R NA’s .

When reading variables, this conversion is already done.

tmp_array[tmp_array == fillvalue$value] <- NA

The head() function can be used before and after executing the “square bracket” selection and replacement to verify that the NA values have indeed replace the netCDF fill values. The total number of non-missing (i.e. land, except for Antarctica) grid cells can be gotten by determining the length of a vector of values representing one slice from the brick, omitting the NA values:

length(na.omit(as.vector(tmp_array[,,1])))

Get a single time slice of the data, create an R data frame, and write a .csv file

NetCDF variables are read and written as one-dimensional vectors (e.g. longitudes), two-dimensional arrays or matrices (raster “slices”), or multi-dimensional arrays (raster “bricks”). In such data structures, the coordinate values for each grid point are implicit, inferred from the marginal values of, for example, longitude, latitude and time. In contrast, in R, the principal data structure for a variable is the data frame. In the kinds of data sets usually stored as netCDF files, each row in the data frame will contain the data for an individual grid point, with each column representing a particular variable, including explicit values longitude and latitude (and perhaps time). In the example CRU data set considered here, the variables would consist of longitude, latitude and 12 columns of long-term means for each month, with the full data set thus consisting of 259200 rows (720 by 360) and 14 columns.

This structure can be demonstrated by selecting a single slice from the temperature “brick”, turning it into a data frame with three variables and 720 by 360 rows, and writing it out as a .csv file.

m <- 1
tmp_slice <- tmp_array[,,m]

The dimensions of tmp_slice, e.g. 720, 360, can be verified using the dim() function.

A quick look (map) at the extracted slice of data can be gotten using the image() function.

image(lon,lat,tmp_slice, col = rev(brewer.pal(10,"RdBu")))

A better map can be obtained using the levelplot() function from the lattice package. The expand.grid() function is used to create a set of 720 by 360 pairs of latitude and longitude values (with latitudes varying most rapidly), one for each element in the tmp_slice array. Specific values of the cutpoints of temperature categories are defined to cover the range of temperature.

grid <- expand.grid(lon = lon, lat = lat)
cutpts <- c(-50,-40,-30,-20,-10,0,10,20,30,40,50)
levelplot(tmp_slice ~ lon * lat, data = grid, at = cutpts, cuts = 11, pretty = T, 
  col.regions = (rev(brewer.pal(10,"RdBu"))))

To create a data frame, the expand.grid() and as.matrix() functions are used to create the 259200 pairs (i.e. rows) of values of longitude and latitude (the columns), and the as.vector() function is used to “unstack” the slice of data into a long vector. The size of the objects that are created can be verified using the dim() and length() functions.

# matrix (nlon*nlon rows by 2 cols) of lons and lats
lonlat <- as.matrix(expand.grid(lon,lat))
dim(lonlat)
# vector of `tmp` values
tmp_vec <- as.vector(tmp_slice)
length(tmp_vec)

The data.frame() and cbind() functions are used to assemble the columns of the data frame, which are assigned appropriate names using the names() function (on the left-hand side of assignment operator). The head() function, applied on top of the na.omit() function lists the first rows of values without NAs:

tmp_df01 <- data.frame(cbind(lonlat,tmp_vec))
names(tmp_df01) <- c("lon","lat",paste(dname,as.character(m), sep = "_"))
head(na.omit(tmp_df01), 10)

Finally the data frame is written out to the working directory as a .csv file, using na.omit() again to drop the observations with missing data (i.e. ocean points and Antarctica).

csvfile <- "cru_tmp_1.csv"
write.table(na.omit(tmp_df01),csvfile, row.names = FALSE, sep = ",")

Convert the whole array to a data frame, and calculate MTWA, MTCO and the annual mean

The idea here is to convert the nlon by nlat by nt 3-d array into a (nlon by nlat) by nt 2-d matrix. (This will work if the netCDF data set was written as a CF-compliant data set, with arrays dimensioned as in Fortran, i.e. as nlon x nlat x nt arrays). First, create a long vector tmp_vec.long using the as.vector() reshaping function, and verify its length, which should be 3110400.

tmp_vec_long <- as.vector(tmp_array)
length(tmp_vec_long)

Then reshape that vector into a 259200 by 12 matrix using the matrix() function, and verify its dimensions, which should be 259200 by 12.

tmp_mat <- matrix(tmp_vec_long, nrow = nlon*nlat, ncol = nt)
dim(tmp_mat)
head(na.omit(tmp_mat))

Create the second data frame from the tmp_mat matrix.

lonlat <- as.matrix(expand.grid(lon,lat))
tmp_df02 <- data.frame(cbind(lonlat,tmp_mat))
names(tmp_df02) <- c("lon","lat","tmpJan","tmpFeb","tmpMar","tmpApr","tmpMay","tmpJun",
  "tmpJul","tmpAug","tmpSep","tmpOct","tmpNov","tmpDec")
options(width = 96)
head(na.omit(tmp_df02, 20))

Get annual mean, mtwa and mtco values and add them the second data frame.

tmp_df02$mtwa <- apply(tmp_df02[3:14],1,max) # mtwa
tmp_df02$mtco <- apply(tmp_df02[3:14],1,min) # mtco
tmp_df02$mat <- apply(tmp_df02[3:14],1,mean) # annual (i.e. row) means
head(na.omit(tmp_df02))
#dim(na.omit(tmp_df02))

Write the second data frame out as a .csv file, dropping NAs.

csvfile <- "cru_tmp_2.csv"
write.table(na.omit(tmp_df02),csvfile, row.names = FALSE, sep = ",")

Create a third data frame, with only non-missing values. This will be used later to demonstrate how to convert a “short” data frame into full matrix or array for writing out as a netCDF file.

tmp_df03 <- na.omit(tmp_df02)
head(tmp_df03)

Check what’s in the current workspace:

ls()
outworkspace = "netCDF02.RData"
save.image(file = outworkspace)


bgctw/ncdfTools documentation built on Jan. 29, 2020, 1:16 p.m.