library(eiamod) library(earth) #library(randomForest) rsltsummary <- function(rslt, stat='r.squared') { dplyr::bind_rows( lapply(names(rslt), function(rgn) { r2 <- summary(rslt[[rgn]])[[stat]] data.frame(rgn=rgn, r2=r2, stringsAsFactors = FALSE) }) ) } DO_SAVE_DATA <- FALSE
stdata <- standardize_byrgn(region_hourly, c('load','temperature'))
Run linear models by region as a baseline for performance.
rsltlm <- load_model(hourly_data=stdata) rsltsummary(rsltlm)
We expect an increase in load for both high and low temperatures, so we can retry the linear model with a squared term.
rsltlmsqr <- load_model(hourly_data=stdata, formula=load~temperature + I(temperature^2)) rsltsummary(rsltlmsqr)
rsltearth <- load_model(earth, hourly_data=stdata) rsltsummary(rsltearth, 'grsq')
Interesting. These compare pretty favorably to the linear model using the squared term. It looks like it's a little higher across the board. We can try adding more terms to see if we get anything new.
rsltearthd2 <- load_model(earth, stdata, formula=load~temperature+I(temperature^2), degree=3) rsltsummary(rsltearth, 'grsq')
Same result as the above. I would interpret this as saying that the main thing here is allowing the model to increase in both directions, low and high. The $T$^2 term does this, and it also adds some curvature to the relation, but the curvature isn't doing anythng for us because the earth model with just temperature does just as well.
I don't expect this to add a whole lot of value, since in the end we have only the one predictor, but it's worth seeing what we can get with an off the shelf ML algorithm. (Removed because it takes a long time to run and doesn't perform very well.)
##rsltrf <- load_model(randomForest, stdata, ntree=100) ##rsltsummary(rsltrf, 'rsq')
The MARS model seems to perform the best, so we will use that one to take out the temperature component of the load.
ep <- as.vector(sapply(rsltearth, predict)) plot(stdata$load, ep, pch='.') stdata <- dplyr::mutate(stdata, residual_load = load - ep) hist(stdata$residual_load, breaks=100) bwidth <- 0.1 fx <- function(x) {nrow(stdata)*bwidth*dnorm(x, mean=0, sd=sd(stdata$residual_load))} ggplot(data=stdata, aes(x=residual_load)) + geom_histogram(binwidth=bwidth) + stat_function(fun=fx, size=1.25) + theme_bw(16) + xlab('Residual Electricity')
This data is going to be the basis for our modeling, so we will store this as
residual_load
in the package data.
if(DO_SAVE_DATA) { residual_load <- dplyr::select(stdata, EconYear, quarter, week, region, time, residual_load) usethis::use_data(residual_load, compress='xz') }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.