vignettes/evalwaterfallr-vignette.md

title: "evalwaterfallr" author: "JC" date: "2017-02-22" output: rmarkdown::html_vignette keep_md: true vignette: > %\VignetteIndexEntry{evalwaterfallr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}

This document describes how to create order-independent permutations and waterfall graphics using evalwaterfallr. There are three functions in the package. Three of them are intended to be used directly: waterfallPrep(), addwaterfallPrep() and waterfallPlot(). The wParamPermute() function is called by waterfallPrep() and could be used on its own by some users, but that scenario it is not expected to occur often.

Permutations of mutliplicative parameters can be completed by straightforward maths, but are error prone and time consuming in spreadsheet applications. The time and risk increases with the number of parameters. A three parameter equation requires six permutations, but a four parameter equation requires 24. This package automates the permutation and creates tables that can be used for creating the waterfall plot (within the package) or exported for other uses. In addition, it extends the methods for additive parameters to ensure consistency in reporting for additive and multiplicative parameters.

For motivation and relevant readings, please see the package Readme.

waterfallPrep()

This function takes the input of multiplicative parameters (up to 10) as a dataframe and the key values of Gross Reported, NTG Reported and NTG Evaluated. Note: these terms are from the field of energy efficiency evaluation, but could be extended to other applications. Here are the conceptualizations of the key values:

All permutations of the order of multiplicative adjustments are determined by calling wParamPermute() inside of waterfallPrep(). This is called separately for the net and hybrid permutations. For the net permutation, the multiplicative parameters and the NTG realization rate (the fraction of ntg.eval/ntg.report). For the hybrid permutation, the Reported NTG, the multiplicative parameters, and the NTG realization rate.

The data included in the package, rawparamdf, has four impact parameters, for an imaginary lighting program. They are named:"ISR","DeltaWatts","HOU","IE" to abbreviate In Service Rate, Delta Watts, Hours of Use, and Interactive Effects, respectively. These names are used throughout the code, but they could be any character string.

rawparamdf <- data.frame( # lighting example
                          params = c("ISR","deltaWatts","HOU","IE"),
                          value = c(0.5, 0.7, 1.2, 1.5),
                          stringsAsFactors = FALSE
                         )

rawparamdf
#>       params value
#> 1        ISR   0.5
#> 2 deltaWatts   0.7
#> 3        HOU   1.2
#> 4         IE   1.5

By default, waterfallPrep() only requires the parameter table, and it will use defaults for key values and assume all tables of output are desired. The defaults are: gross.report = 100, NTG.report = 1, and NTG.eval = 1.

waterfallPrep(rawparamdf)# minimal call
#> $`No Permutatation`
#>        variable given total base increase decrease
#> 1 Ex Ante Gross 100.0   100   NA       NA       NA
#> 2           ISR   0.5    NA   50        0       50
#> 3    deltaWatts   0.7    NA   35        0       15
#> 4           HOU   1.2    NA   35        7        0
#> 5            IE   1.5    NA   42       21        0
#> 6 Ex Post Gross    NA    63   NA       NA       NA
#> 7   Ex Post NTG   1.0    NA   63        0        0
#> 8   Ex Post Net    NA    63   NA       NA       NA
#> 
#> $`Gross Waterfall`
#>        variable given total     base increase decrease
#> 1 Ex Ante Gross 100.0   100       NA       NA       NA
#> 2           ISR   0.5    NA 42.20833  0.00000 57.79167
#> 3    deltaWatts   0.7    NA 12.08333  0.00000 30.12500
#> 4           HOU   1.2    NA 12.08333 15.70833  0.00000
#> 5            IE   1.5    NA 27.79167 35.20833  0.00000
#> 6 Ex Post Gross    NA    63       NA       NA       NA
#> 7   Ex Post NTG   1.0    NA 63.00000  0.00000  0.00000
#> 8   Ex Post Net    NA    63       NA       NA       NA
#> 
#> $`Net Waterfall`
#>        variable given total      base increase decrease
#> 1 Ex Ante Gross 100.0   100        NA       NA       NA
#> 2   Ex Ante NTG   1.0    NA 100.00000  0.00000  0.00000
#> 3   Ex Ante Net    NA   100        NA       NA       NA
#> 4           ISR   0.5    NA  42.20833  0.00000 57.79167
#> 5    deltaWatts   0.7    NA  12.08333  0.00000 30.12500
#> 6           HOU   1.2    NA  12.08333 15.70833  0.00000
#> 7            IE   1.5    NA  27.79167 35.20833  0.00000
#> 8        RR NTG   1.0    NA  63.00000  0.00000  0.00000
#> 9   Ex Post Net    NA    63        NA       NA       NA
#> 
#> $`Hybrid Waterfall`
#>        variable given total      base increase decrease       calc
#> 1 Ex Ante Gross 100.0   100        NA       NA       NA         NA
#> 2   Ex Ante NTG   1.0    NA 100.00000  0.00000  0.00000  0.0000000
#> 3           ISR   0.5    NA  42.20833  0.00000 57.79167 -0.5779167
#> 4    deltaWatts   0.7    NA  12.08333  0.00000 30.12500 -0.3012500
#> 5           HOU   1.2    NA  12.08333 15.70833  0.00000  0.1570833
#> 6            IE   1.5    NA  27.79167 35.20833  0.00000  0.3520833
#> 7        RR NTG   1.0    NA  63.00000  0.00000  0.00000  0.0000000
#> 8   Ex Post Net    NA    63        NA       NA       NA         NA


# this is equivalent to
# not run
# waterfallPrep(rawparamdf, 
#              gross.report = 100, NTG.report = 1, NTG.eval = 1, #defaults
#              altparamnames = NULL, # default
#              output="all") #defaults

Alternatively, we can store one of the output tables, here: "gross".

gross_tab <- waterfallPrep(rawparamdf, output="gross") 
gross_tab
#>        variable given total     base increase decrease
#> 1 Ex Ante Gross 100.0   100       NA       NA       NA
#> 2           ISR   0.5    NA 42.20833  0.00000 57.79167
#> 3    deltaWatts   0.7    NA 12.08333  0.00000 30.12500
#> 4           HOU   1.2    NA 12.08333 15.70833  0.00000
#> 5            IE   1.5    NA 27.79167 35.20833  0.00000
#> 6 Ex Post Gross    NA    63       NA       NA       NA
#> 7   Ex Post NTG   1.0    NA 63.00000  0.00000  0.00000
#> 8   Ex Post Net    NA    63       NA       NA       NA

Usually, Gross Reported and NTG values are known, so they should be adjusted from the defaults. The parameter names can also be changed.

Here, let's assume that the Gross Reported (gross.report) value is 200 and the NTG fractions (ntg.report and NTG.eval) are 0.8 and 0.6, respectively. We want to rename our parameters and only get the net permutation table. Note that altparamnames must have the same number of arguments as df has variables (rows). The function will stop with message "alternate parameter name vector length not the same as parameter length" if they do not match.


# assume gross.report is 200, NTG.report = 0.8, and NTG.eval=0.6
net_tab <- waterfallPrep(rawparamdf, 200, .8, .6, 
                         altparamnames=c("Installation\nRates","delta\nWatts",
                                         "Hours\nof Use","Interactive\nEffects"), 
                         output="net") 
net_tab
#>               variable given total      base increase decrease
#> 1        Ex Ante Gross 200.0 200.0        NA       NA       NA
#> 2          Ex Ante NTG   0.8    NA 160.00000  0.00000 40.00000
#> 3          Ex Ante Net    NA 160.0        NA       NA       NA
#> 4  Installation\nRates   0.5    NA  79.53000  0.00000 80.47000
#> 5         delta\nWatts   0.7    NA  37.26000  0.00000 42.27000
#> 6        Hours\nof Use   1.2    NA  37.26000 22.31333  0.00000
#> 7 Interactive\nEffects   1.5    NA  59.57333 50.26333  0.00000
#> 8               RR NTG   0.6    NA  75.60000  0.00000 34.23667
#> 9          Ex Post Net    NA  75.6        NA       NA       NA

# this table can be exported for use in other programs or passed to the 
# waterfallPlot() function directly, as shown below
# not run
# write.csv(net_tab, file="/path/to/dir/net_tab.csv")

The output of waterfallPrep() can be used directly by waterfallPlot() as shown below, or used in another software product, like Excel, to create desired graphs.

addwaterfallPrep()

This function creates the permutations for additive, rather than multiplicative parameters. The output of addwaterfallPrep() can be used directly by waterfallPlot() as shown below, or used in another software product, like Excel, to create desired graphs. The additive parameters have no permutation for the gross table, but are permuted with NTG realization rate for the net table and permuted with the NTG reported and NTG realization rate for the hybrid table.


# assume gross.report is 100, NTG.report = 0.8, and NTG.eval=0.6
# assume our parameters are:
addrawparamdf <- data.frame( # excel example
                          params = c("A","B","C"),
                          value = c(-30, 20, -40),
                          stringsAsFactors = FALSE
                         )

add_tab <- addwaterfallPrep(addrawparamdf, 100, .8, .6,  
                         output="all") 

We can see the gross, net, and hybrid permutation tables:

add_tab[[2]] # gross
#>        variable given total base increase decrease
#> 1 Ex Ante Gross 100.0   100   NA       NA       NA
#> 2             A -30.0    NA   70        0       30
#> 3             B  20.0    NA   70       20        0
#> 4             C -40.0    NA   50        0       40
#> 5 Ex Post Gross    NA    50   NA       NA       NA
#> 6   Ex Post NTG   0.6    NA   30        0       20
#> 7   Ex Post Net    NA    30   NA       NA       NA
add_tab[[3]] # net
#>        variable given total base increase decrease
#> 1 Ex Ante Gross 100.0   100   NA       NA       NA
#> 2   Ex Ante NTG   0.8    NA   80        0       20
#> 3   Ex Ante Net    NA    80   NA       NA       NA
#> 4             A -30.0    NA   59        0       21
#> 5             B  20.0    NA   59       14        0
#> 6             C -40.0    NA   45        0       28
#> 7        RR NTG   0.6    NA   30        0       15
#> 8   Ex Post Net    NA    30   NA       NA       NA
add_tab[[4]] # hybrid
#>        variable given total     base increase decrease
#> 1 Ex Ante Gross 100.0   100       NA       NA       NA
#> 2   Ex Ante NTG   0.8    NA 86.66667  0.00000 13.33333
#> 4             A -30.0    NA 62.91667  0.00000 23.75000
#> 5             B  20.0    NA 62.91667 15.83333  0.00000
#> 6             C -40.0    NA 47.08333  0.00000 31.66667
#> 7        RR NTG   0.6    NA 30.00000  0.00000 17.08333
#> 8   Ex Post Net    NA    30       NA       NA       NA

waterfallPlot()

This is the waterfall plotting function. It is inspired by the code developed by James Kierstead Post on Watefall Plots with UK Emissions Data. waterfallPlot() assumes the input dataframe is already in order and creates fill categories based on the values in the table. Intermediate totals are plotted here, which are not possible with James' waterfall(). For data that are more categorical or do not require intermediate totals, check out his very useful waterfall gist.

We can see the multiplicative net permutation table from the example above.

waterfallPlot(net_tab)

plot of chunk unnamed-chunk-7

What about all of the additive tables? Gross

waterfallPlot(add_tab[[2]])

plot of chunk unnamed-chunk-8 Net

waterfallPlot(add_tab[[3]])

plot of chunk unnamed-chunk-9 Hybrid

waterfallPlot(add_tab[[4]])

plot of chunk unnamed-chunk-10

Alternatively, we can send a table that we may have from outside this package.

library(ggplot2)
library(scales)
library(dplyr)
# let's assume that we have rrdf and just want to plot it:
rrdf <- data.frame( # made up example
         variable = c("Start","Factor 1","Factor 2","Factor 3","End"),
        total = c(100, rep(NA, 3), 75),
         base = c(NA, 75, 50, 50,NA),
         increase = c(NA, 0, 0, 25, NA),
         decrease = c(NA, 25, 25, 0, NA))
waterfallPlot(rrdf)

plot of chunk unnamed-chunk-11

It is straightforward to change the colors and the labels. However, if xfactors is supplied, it must have the same number of arguments as the data frame supplied. If it does not, the default values will be used, but the plot will still render.

library(dplyr)
# With another color palette. Note that totals stay grey.

plot of chunk unnamed-chunk-12

When specific fonts or other design issues are preferred, it may be best to modify the waterfallPlot() function to meet those needs rather than try to append ggplot2 calls.

wParamPermute()

This function does the underlying permutation. It is called by waterfallPrep() to permute the factors passed by df. It returns a dataframe with the same parameter names and the average permuted value.

For the function call to waterfallPrep() above, this is what was sent to and returned within the function. Note: had altparamnames been defined, that would have been sent rather than rawparamdf[,1].

wParamPermute(rawparamdf[,1],rawparamdf[,2])
#> Warning: doParallel may make this function faster for large order
#> permutations if it is installed.
#>   param.names     avg.xx
#> 1         ISR -0.5779167
#> 2  deltaWatts   -0.30125
#> 3         HOU  0.1570833
#> 4          IE  0.3520833

The number of intermediate values generated is the factorial of the number of parameters. For 3 factors, there are 6 permutations (3!=6); for 5 factors, there are 120 (5!=120); and for 7 factors, there are 5,040 (7!=5,040).

This function can take significant time for large order permutations, even if doParallels() is installed. A warning message is always presented that suggests doParallels() will make it faster for large permutations. Looking at the system time for arbitrary calls is informative.

# three
system.time(wParamPermute(c("ten","five","four"),
                          c(10,5,4)))
#> Warning: doParallel may make this function faster for large order
#> permutations if it is installed.
#>    user  system elapsed 
#>   0.010   0.000   0.011
# five
system.time(wParamPermute(c("ten","five","four","six","ten"),
                          c(10,5,4,6,10)))
#>    user  system elapsed 
#>   0.084   0.058   0.278
# seven
system.time(wParamPermute(c("ten","five","four","six","ten","two","six"),
                          c(10,5,4,6,10,2,6)))
#>    user  system elapsed 
#>   1.732   0.020   1.756


EMIjess/evalwaterfallr documentation built on May 6, 2019, 3:09 p.m.