knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "80%", dpi = 300, fig.align = "center" )
The goal of horizon
is to infer operating policies from daily dam and reservoir data, including the use of foresight in release decisions.
You can install horizon
from this repository using devtools
:
devtools::install_github("IMMM-SFA/horizon")
Once installed, load the library:
library(horizon)
library(lubridate) library(ggplot2) library(dplyr)
The following example walks through the derivation of weekly operating policy and forecast use signature for Glen Canyon Dam (Lake Powell), which is operated by the US Bureau of Reclamation (https://www.usbr.gov/uc/water/crsp/cs/gcd.html).
horizon
is designed to read daily operational csv files in the format: date (yyyy-mm-dd), storage (acrefeet), release (cubic feet per second), and inflow (cubic feet per second).
The show_dams
function may be used to explore the default dataset. Running show_dams("usbr"
) will show all US Bureau of Reclamation dams available for analysis. The data for any dam is read into the environment using read_dam()
:
read_dam("usbr_lakepowell") -> lakepowell_raw lakepowell_raw
The function argument is simply a combined string indicating the data source ("usbr") follwed by the dam ("lakepowell") separated by an underscore.
The data include daily inflow and storage. Release data are unavailable and will have to be estimated later...
lakepowell_raw %>% mutate(jday = yday(date), year = year(date)) %>% select(jday, year, `Storage, acrefeet` = s_af, `Inflow, cfs` = i_cfs) %>% tidyr::gather(metric, value, -jday, -year) %>% ggplot(aes(jday, value, group = year)) + geom_line(aes(color = year), alpha = 0.7) + facet_wrap(~metric, scales = "free_y", ncol = 1) + theme_bw() + labs(title = "Lake Powell operations, 1963 - 2017", x = "Day of calendar year", y = NULL)
horizon
contains a variety of functions for pre-processing the data into the correct format for deriving "forecast use signatures".
convert_to_metric
First, we convert the units to metric volumes (so that inflow and release are daily totals rather than flow rates):
lakepowell_raw %>% convert_to_metric() -> lakepowell_metric # all units now in Million cubic meters lakepowell_metric
fill_NAs
Next we fill any small gaps in the time series. fill_NAs
fills gaps up to a maximum of max_fill_gap
days (default = 10); records containing longer gaps will be excluded from the analysis at a later stage.
lakepowell_metric %>% fill_NAs(max_fill_gap = 10) -> # ^^ gaps of maximum 10 days are filled using cubic spline interpolation lakepowell_gapfilled lakepowell_gapfilled
convert_to_water_years
lakepowell_gapfilled %>% convert_to_water_years() -> lakepowell_wateryrs lakepowell_wateryrs %>% filter(water_year == 2016) # the water year starts 1st October of the prior calendar year
aggregate_to_water_weeks
After identifying water years, the daily time series are aggregated to water weeks 1-52 (horizon:::gen_water_weeks()
may be used to view calendar days to water week mapping).
lakepowell_wateryrs %>% aggregate_to_water_weeks() -> lakepowell_weekly lakepowell_weekly
Some new variables are introduced, too. s_start
and s_end
are the starting and end storage volumes for each week, and s_change
is the resulting change in storage. The latter is used to back-calculate new variables i_
and r_
, which are the inflow and release volumes estimated using s_change
assuming convervation of mass (and no evaporation or other water losses).
back_calc_missing_flows
The final step is then to select the final set of inflow, release and storage variables for forecast use signature derivation:
lakepowell_weekly %>% back_calc_missing_flows(compute_from = "i") -> # ^^ compute_from = "i" tells the function to use the estimated release... # ... since observed release is missing lakepowell_final lakepowell_final
All of the above steps are rolled into the compute_availability
function, which takes the additional step of adding the availability variable (a
) for a chosen water week and given future inflow horizon. The availabiltiy is simply the sum of the starting storage and the cumulative inflow out to the chosen horizon (in weeks). For example:
compute_availability("usbr_lakepowell", # ^^ note that we can simply supply the name of the dam; # the function carries out pre-processing automatically. water_week = 1, horizon = 1, min_allowable_points = 10, cutoff_year = 1995)
This implementation of compute_availability
provides the required data to display the release-availability scatter for water week 1 with a horizon of 1 week ahead (from the start of the water week). Two additional arguments are supplied. min_allowable_points
sets a minimum number fo data points, and throws an error in cases where there are less than the specified number of years of release and availability data for a given water week (here set to 10 data points). cutoff_year
filters the input data to remove all points prior to the cutoff year (here 1995). Long records likely encompass different release policies (and use of forecasts may be relatively recent). The cutoff helps avoid conflating the analysis with multiple operating polcies across many years of operation.
horizon
features in-built functions for plotting these data:
hplot_ready_data("usbr_lakepowell", water_week = 1, horizon = 1, cutoff_year = 1995)
We can add an optimized piecewise linear function with a simple call to add_piecewise_fn
:
hplot_ready_data("usbr_lakepowell", water_week = 1, horizon = 1, cutoff_year = 1995, add_piecewise_fn = TRUE)
In the above case it appears that availabilty predics release quite well; the simple policy function fits nicely, particularly for the wetter years of operation when water availability is high. But often availability with . In the following example, where the water week is changed to week 25, we see that water availability with a horizon of 1 week is a poor predictor of the release decision:
hplot_ready_data("usbr_lakepowell", water_week = 25, horizon = 1, cutoff_year = 1995, add_piecewise_fn = TRUE)
One can use the hplot_ready_data
function to investigate multiple water weeks and horzions simultaneously:
hplot_ready_data("usbr_lakepowell", water_week = 24:25, horizon = c(1, 12), cutoff_year = 1995, add_piecewise_fn = TRUE)
Now we see that water availability with a horizon of 12 weeks results in a much closer policy fit than water availability with a horizon of one week. In deriving the foreccast use signature, we use the above form of analysis to infer the forecast horizon that might be used in determining water release decisions at different weeks of the water year.
We can generate the piecewise functions for all water weeks with all candidate horizons (1 - 30 weeks ahead) simultaneously using get_optimized_models
. An optional argument write_to
(not used here) can be used to specify an output directory where results will be saved in .csv
format.
get_optimized_models("usbr_lakepowell") -> usbr_lakepowell_pw_all
note: this can take several minutes to run, since all 30 * 52 piecewise functions must be optimized. horizon
is designed to use all cores of your machine to derive these piecewise models.
usbr_lakepowell_pw_all
The result includes the parameters (p1
, p2
, p3
, p4
) and goodness-of-fit (r_sq
, the coefficient of determination) for piecewise functions for all combinations of water weeks and possible inflow horizons from 1 to 30 weeks ahead. The four parameters represent the left-hand and right-hand side function slopes and break-point coordinates (p3
= release, p4
= availability), respectively.
If we plot for each water week the R-squared values as a function of the horizon, we begin to get a sense of how foresight may be driving operations:
usbr_lakepowell_pw_all %>% mutate(r_sq = round(r_sq, 1)) %>% # ^^ rounding R-squared helps draw a clearer distinction between policy fits ggplot(aes(horizon, r_sq)) + geom_line() + facet_wrap(~water_week, ncol = 13) + theme_bw() + scale_y_continuous(breaks = c(0, 1), limits = c(0, 1), minor_breaks = seq(0, 1, by = 0.2)) + scale_x_continuous(breaks = c(15, 30)) + labs(title = "Policy fits for all candidate horizons", subtitle = "Each graph represents a water week", x = "Horizon (weeks)", y = "Coefficient of determination of policy fit")
The plot reveals an interesting shift in pattern throught he water year. Early in the water year (top row) the policy fits (y-axes) are relatively stable across horizons (x-axes); there is no strong evidence of a release policy being driven by any particular inflow horizon more than another. In such cases we infer that releases are dictated by the currently available water (i.e., water in storage plus current period inflow). The shift occurs by around week 16 (mid-January), at which point the policy fits begin to improve with longer horizons. By week 21 (end February) the evidence for foresight in operations becomes very strong, with the 15-week horizon offering a policy fit of approximately 0.8, compared to ~0.2 without foresight use (i.e., horizon = 1 week).
The forecast use signature specifies the horizon used in operations for all weeks of the water year. It is created by selecting the horizon for each water week that provides the closest piecewise function fit.
usbr_lakepowell_pw_all %>% select_best_horizon() %>% hplot_selected_models()
The above forecast use signature is rather noisy--which is unsurprising given some of the uncertainties in the method. However, we want to avoid inferring forecast use in cases where the policy fits are weak. We also want to avoid sharp spikes. We address the noise with some smoothing:
usbr_lakepowell_pw_all %>% select_best_horizon() %>% horizon:::remove_low_rsq(rsq_cutoff = 0.2) %>% # ^^ sets horizon to 1 for all cases with R-squared < rsq_cutoff horizon:::despike() %>% # ^^ removes sharp spikes horizon:::post_smooth() %>% # ^^ gentle smoothing spline hplot_selected_models() + ylim(0, 30)
The forecast use signature suggests long-range forecast use (~3-4 months ahead) from mid-December (water week 10) through late spring, suggesting that release policy at Glen Canyon may be formed using a combination of planned upstream releases and snowpack information, which indicates the April-June snowmelt-driven inflows.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.