The vignette is a working draft and the package is under current development, thus documentation and functionality may be changing.

The RWDataPlot package provides a mechanism to import data from RiverWareTM simulations into R. Starting with data in the form of RiverWare data files (rdf), the data are read into R and can then be maniuplated. The package is intended to compliment the RiverWare RiverSMARTTM tool, which includes a plugin to run R code. The package provides functionality to read in data from multiple scenarios, add common attributes, and provides common plotting functions utilizing ggplot2. Many of the plotting functions are written so that figures that are commonly generated from results from the Bureau of Reclamation's Colorado River Simulation System (CRSS) can be automatically and quickly created, however; they can be utilized to plot any multi-trace simulation results.

Current Implicit Assumptions in Package

Background

Need to edit this section

Recent and ongoing enhancements to RiverWareTM, including the development of the RiverWare RiverSMARTTM tool have made it increasingly easy to generate large amounts of data from RiverWare simulations. As the many data sets become increasingly large, it R becomes a more appealing analysis tools over Excel. RiverWare natively creates ASCII text files called rdf, or RiverWare data files, that are commonly converted to Excel files for analsyis purposes. This package provides a tool to read those same rdf files into R. RiverWare stores simulation results in various "slots", or variables representing different reservoir or reach specific results. RiverWare is also commonly used in a "multiple run" mode, where many unique realizations, or traces, of hypothetical operations can be simulated. A suite of traces represents a single scenario, and it is common to want to compare results of different scenarios against each other. The simulation of many scenarios is handled by RiverSMART, as is the analysis of these many scenarios.

The RWDataPlot package provides documentation for the steps the Bureau of Reclamation takes when computing many of the common statistics on future simulations that are reported to Stakeholders. Hopefully, it will also alow for a more widespread utilization of existing analysis code currently maintained by various Reclamation employees.

Manipulating RDFs in R

The RWDataPlot package provides fucntionality to (1) read rdf data into R, (2) format the data as a simple matrix, and (3) aggregate rdf data from many scenarios into a single data frame.

Reading RDF Into R

The read.rdf() function will read rdf data into R. The function returns a multi-level list that contains all meta data embedded in an rdf file, including the start and end timesteps, the number of traces, units of each slot, and RiverWare object type for each slot. However, the list returned by read.rdf is not in a form that is particularly easy to manipulate. Since each rdf file can include data for many slots, data for a single trace and single slot are embedded in the list in rdfList$runs[[1]]$objects$'Powell.Pool Elevation'$values, where rdfList is a list returned by read.rdf, 1 is the trace number, and Powell.Pool Elevation is the slot.

The function rdfSlotToMatrix will create a time by trace matrix for a single slot so that the data are in a form that is easier to manipulate in R, as shown in the table below.

| |Trace 1|Trace 2|...|Trace N| |-------:|------:|------:|--:|------:| |Jan 2015|1050.23|1114.33|...|1077.77| |Feb 2015|1055.76|1117.95|...|1080.34| |Mar 2015|1056.23|1120.00|...|1079.67| | ...| ... | ... |...| ... | | Dec N|1100.55|1099.35|...|1095.12|

To allow the user to see the slots that are included in the rdf file once it is read into R, the function listSlots will return all of the slots included in the rdf file.

The steps below show an example of reading an rdf file into R, looking at the available slots, and then coverting one of the slots to a matrix.

library(RWDataPlot)
rdf <- read.rdf2(system.file('extdata',file.path('Scenario','DNF,CT,IG','KeySlots.rdf'),package = 'RWDataPlot'))
pe <- rdfSlotToMatrix(rdf, 'Powell.Pool Elevation')

Manipulation and Aggregation Functions

To assist in working with the data returned by rdfSlotToMatrix several other functions are provided that perform common annualization functions as follows.

  1. getMaxAnnValue returns a matrix of the annual maximum for each year and trace.
  2. getMinAnnValue returns a matrix of the the annual minimum for each year and trace.
  3. sumMonth2Annual returns a matrix of the sum of the monthly values for each year and trace. Can optionally scale the values with this function if, for example, an annual volume is desired by summing monthly flow data.
  4. flowWeightAvgAnnConc returns a matrix of the annual flow-weighted average annual concentration if given monthly flow and monthly concentrations.

If the first two years of elevations for two traces are:

pe[1:24,1:2]

Then, getMinAnnValue finds and returns the annual minimum values.

peMin <- getMinAnnValue(pe)

And the annual minimum values in peMin are:

peMin[1:2,1:2]

The help pages provide more details on the other aggregation functions.

Aggregating Multiple Scenarios

While the functionality provided by read.rdf, rdfSlotToMatrix, and the few simple aggregation methods is useful when a user needs to interactively explore the data, the real utility in the RWDataPlot package is in combining results from many scenarios into a single data frame. The data frame can then be utilized for further analysis or easily plotted using ggplot2.

Because the monthly data is typically annualized in some manner, and because storing all monthly values is somewhat inefficient, the scenario data can be aggregated in many ways before it is combined. The package provides several common functions that can be applied to the monthly data, prior to combining scenario results.

As this packages is written to interface with RiverSMART, it is assumed that different scenarios are represented by different folder names. The package will look into these different folders for the same rdf files, but the scenario names can be changed for the data frame in R.

The steps for annualizing the scenario results are: 1. Determine which slots, from which rdf files are needed. 1. Determine how the slots will be annualzied, i.e., summarized. 1. Create the table, i.e., slot aggregation table, that provides the information from steps 1 and 2 to the package. 1. Create the "slot aggregation list" from the table, using createSlotAggList. 1. Run the code that saves the data frame. 1. The saved data frame can then be read into R or other programs for plotting.

Determine necessary slots and rdfs

The user must know which slots are in which rdf files. The control file for the RiverWare simulation should provide this information.

Available Summary Functions

The following summary functions are available. One of the following must be selected for each slot and provided in the slot aggregation table. Note: Currently, if monthly and annual data is desired, it is safest to create to seperate data tables, one with only monthly data, and the other with annualized data. Each of the summary functions also takes a second argument that either scales the data, or compares the data to a threshold. Also, all NaNs are changed to 0s.

  1. Monthly: Returns the monthly scaled data.
  2. EOCY: Returns the End-of-calendar year scaled values.
  3. AnnMax: Returns the maximum annual scaled values.
  4. AnnSum: Returns the scaled annual sum of the monthly data.
  5. AnnMinLTE: Checks to see if the annual minimum value is less that or equal (LTE) to a threshold. Returns 100 if it is less than or equal to the threshold and 0 otherwise.
  6. WYMinLTE: Checks to see if the minimum water year value is less than or equal to a threshold. Returns 100 if it is less than or equal to the threshold and 0 otherwise. The water year is defined as October through September of the next year. For the first year, only January through September are evaluated as RiverWare does not typically export pre-simulation data.

NA can be used if there is no desire to scale the data. For methods that use thresholds, a numeric value should be used.

Create the Slot Aggregation Table

The slot aggregation table can either be created in R, or read in from an external csv. In either case, the table should take on the following form:

|rdf| Slot | Aggregation Method |Threshold or Scaling Factor| Variable Name (optional)| |:-------------:|:----------------------:|:---------:|:----------:|:---------------:| |'KeySlots.rdf' | 'Powell.Pool Elevation' | 'EOCY' | NA | Powell EOCY Elevation | |'KeySlots.rdf' | 'Mead.Pool Elevation' | 'AnnMinLTE' | '1100' | Mead < 1,100 | |'KeySlots.rdf' | 'Mead.Pool Elevation' | 'AnnMinLTE' | '1060' | Mead < 1,060 | |'Other.rdf' | 'Powell.Outflow' | 'AnnualSum' | '0.001' | Powell Annual Release |

The above table lists each slot, the rdf the slot is saved in, the summary function, the threshold to be used to scale the data by or compare the data to, and an optional specified variable name. The previous section describes which aggregation methods use a threshold and which use a scaling factor. For example, the first row will result in compiling all end-of-December values for Powell's pool elevation. The data will not be scaled, and is found in KeySlots.rdf. The second row will find the annual minimum Mead pool elevation and see if it is less than or equal to 1,100' feet in the second line and less than or equal to 1,060' feet in the third row. To scale the data by a value less than 1, use decimals rather than fractions, as shown in the fourth row. If the "Variable Name column"" was not specified, the variable name for the first row would be Powell.Pool Elevation_EOCY_1 as NA is replaced with a 1 when construcing the variable names.

The data will be processed for each row, for all scenarios.

Create the Data Table for All Scenarios

After creating the slot aggregation table, either as an R matrix or as an external csv file, use createSlotAggList to create the slot aggregation list. Then, use getDataForAllScens to aggregate the data for all scenarios. getDAtaForAllScens will save the data frame as a .txt file in the user specified folder. The .txt file can then be read into R for further analsysis, or for plotting. The choice was made to write the file out so that the user does not have to re-process the data everytime a small change to a plot is desired.

scenPath <- system.file('extdata','Scenario',package = 'RWDataPlot')
scenFolders <- c('DNF,CT,IG/','DNF,CT,IG,Ops1/') # two scenario folders that exist in scenPath
scenNames <- c('Observed Hydrology','Observed Hydrology w/Options') # scenarios names
slotAggList <- createSlotAggList(system.file('extdata','SlotAggTable.csv',package = 'RWDataPlot'))
oFile <- 'allScenariosProcessed.txt'
getDataForAllScens(scenFolders, scenNames, slotAggList, scenPath, oFile)

Generating Common Figures

plot(1:10)
plot(10:1)

Utilizing RWDataPlot within RiverSMARTTM



rabutler/RWDataPlot documentation built on May 26, 2019, 8:51 p.m.