Background

The output of WRF Hydro model runs is spread over multiple netcdf files in time and over multiple netcdf files in different output categories (e.g CHRTOUT, LDASOUT). This tool presents a list-based approach to gathering all the timeseries data you need at once (over both time and output file types) with parallelization at the file level for enhanced speed. The main limitation is that each variable specified may only return a scalar for a given time. That is, if you want a timeseries of rasters you might approach this differently. However, arbitrary statistics can summarize spatial fields on user-defined masks. Or the same variable can be specified multiple times with different index to pull out a subset of the full domain (probably not efficient for large subsets).

Setup

Load the rwrfhydro package.

library("rwrfhydro")
options(width = 190)

Set the WRF Hydro test case directory, this should be the only thing you need to customize to your machine. We'll use daily data from the Fourmile Creek test case where the full routing physics are used.

dataPath <- '~/wrfHydroTestCases/Fourmile_Creek_testcase_v2.0/run.FullRouting/'

List-based data retrieval

The GetMultiNcdf function needs 3 collated lists as input: filesList, variableList, and indexList (e.g. args(GetMultiNcdf)). (Note these are the formal argument names which may be shortend when calling the function, by R rules, upto uniqueness of the argument name. We construct variables in the global workspace to pass to these functions which have similar but not identical names.)

filesList

This is a named list of arbitrary length. The individual entries correspond to different file types of model output and contain all the files you want to look at in the time series. These are usually generated by running files.list() in a directory. Here output files from the land surface model and RESTART files from the hydro model are desired. These are found individually and then placed into fileList

lsmFiles <- list.files(path=dataPath, pattern='LDASOUT_DOMAIN', full.names=TRUE)
hydroFiles <- list.files(path=dataPath, pattern='HYDRO_RST', full.names=TRUE)
flList <- list(lsm=lsmFiles, hydro=hydroFiles)

Note that the output and restart frequencies are different from the number of files of each type.

variableList

All three lists are collated by the name assigned in the list. Now we define which variables are desired for each file group. From the land surface model (LSM) output files, we want the surface radiative temperature (TRAD) and the snow water equivalent (SNEQV). For the hydro restart files, we want streamflow and soil moisture on the four vertical soil layers. The layers are differentiated, for now, only by the names we give them (smc1-4). (HYDRO_RST files are somewhat unique in that they have variables with different dimensions, however they are typcially sparse in time, unless you are preforming data assimilation.)

lsmVars   <- list(TRAD='TRAD', SWE='SNEQV')
hydroVars <- list(streamflow='qlink1', smc1='sh2ox1', smc2='sh2ox2', smc3='sh2ox3', smc4='sh2ox4')
varList <- list(lsm=lsmVars, hydro=hydroVars)

indexList

The indexList defines what indices/stats are desired for each variable in each file group. This list is collated with both of the previous two lists in a nested way, illustrated below.

Only a scalar can be returned for each entry specified index argument. However, spatial fields (over a range of indices) can be summarized using arbitrary statistics. We show how to define your own useful statistics which can be used when specifying the indexList. (Note that the envir argument may be needed to get your function inside of GetMultiNcdf in special circumstances.)

Our statistic example is to calculate basin-average radiative temperature, basin-maximum snow water equivalent, and basin-average soil moisture on each layer. Since all of these variables are on the low-res grid, we need the basin mask from the high-res grid resampled to the low-res geogrid. We use the CreateBasinMask function to generate a basin mask weight grid (each cell value is the fraction of basin within that cell). We specify the path to the high-res routing grid (which contains the basin mask variable), the basin ID we want to run (1), and the aggregation factor between the high-res and low-res grids (10).

library(ncdf4)
basinMask <- CreateBasinMask(paste0(dataPath,'DOMAIN/Fulldom_hires_hydrofile.Fourmile100m.nc'), 
                             basid=1, aggfact=10)

Then, we use this resampled basin mask to setup basin mean and max functions, which determine the mean and maximum values of a variable within the basin mask.

basAvg <- function(var) sum(basinMask*var)/sum(basinMask)
basMax <- function(var) max(ceiling(basinMask)*var)

For fun, we can also calculate the size of the basin while we are here. The LSM pixels are 1km:

basinKm2 <- sum(basinMask)
basinKm2

Now we can construct the indexList. For the LSM, we want spatial summaries of TRAD and SNEQV at each time (note that SNEQV is a single layer). Statistical summaries are requested using a list for each variable. The list specifies a grid 'slice' or 'hyperslab' (as distinct from a subset, i.e. a slice is how ncdf4 lets one subset matrices) on which to compute a named statistic. The required names in this list are start, end, and stat in the list. We want the whole domain, and have to specify that. An ncdump of the first lsmFile shows the variable spatial dimensions, to help specify the start and end arguments.

ncdump(lsmFiles[1])
#File: ~/Fourmile_Creek_testcase_v2.0/run.FullRouting//201210020000.LDASOUT_DOMAIN1
#( NC_FORMAT_CLASSIC ):
#dimensions (5):
#    time = UNLIMITED ; // (1 currently)
#    west_east = 21 ; 
#    south_north = 7 ; 
#    soil_layers_stag = 4 ; 
#    snow_layers = 3 ; 
#...

So we construct

lsmInds   <- list(TRAD=list(start=c(1,1,1), end=c(21,7,1), stat='basAvg'),
                  SNEQV=list(start=c(1,1,1), end=c(21,7,1), stat='basMax'))

and the mask is applied to the whole domain.

Similarly, we need the indices for the HYDRO_RST files. Here we want to get layer-averaged soil moisture so the lists are specified for the individual soil moisture layers, 1-4. Note that the dimensions are reverse order from what is shown in "ncdump -h" on the command line and in rwrfhydro (thanks to ncdf4 package). For the discharge, qlink1, also in the HYDRO_RST file we dont supply a list. Instead we only give the integer index where flow is desired. (See the "WRF Hydro Domain and Channel Visualization" vignette to understand why index 1 on the stream channel just happens, conincidentally, to be the basin outlet.)

hydroInds <- list(qlink1=1,
                  smc1=list(start=c(1,1), end=c(21,7), stat='basAvg'),
                  smc2=list(start=c(1,1), end=c(21,7), stat='basAvg'),
                  smc3=list(start=c(1,1), end=c(21,7), stat='basAvg'),
                  smc4=list(start=c(1,1), end=c(21,7), stat='basAvg') )
## Then we are ready to put them together
indList <- list(lsm=lsmInds, hydro=hydroInds)

GetMultiNcdf()

All the work is really in setting up the lists. Now we just pass these to GetMultiNcdf. The preceeding two lines setup optional parallelization of the data grabs.

library(doMC)   ## Showing parallelization for most unix, which is at the file level within each file 
registerDoMC(3) ##  groups, pointless to be longer than your timeseries. Set parallel=TRUE to use.
fileData <- GetMultiNcdf(filesList=flList, variableList=varList, indexList=indList, parallel=FALSE)

What did we get?

str(fileData)

The fileData dataframe shows the time (POSIXct) at which certain indices (inds) were summarized with statistic stat for each variable (variable names in the file, e.g. sh2ox) The resulting value is given with the variableGroup (e.g. smc1-4 and not sh2ox) and fileGroup.

Plot the timeseries

This output format is easily plotted using ggplot2.

library(ggplot2)
library(scales)
ggplot(fileData, aes(x=POSIXct, y=value, color=fileGroup)) +
  geom_line() + geom_point() +
  facet_wrap(~variableGroup, scales='free_y', ncol=1) +
  scale_x_datetime(breaks = date_breaks("1 month")) + theme_bw()


mccreigh/rwrfhydro documentation built on Feb. 28, 2021, 1:53 p.m.