library(eiamod)
library(earth)
#library(randomForest)

rsltsummary <- function(rslt, stat='r.squared') {
    dplyr::bind_rows(
        lapply(names(rslt), function(rgn) {
            r2 <- summary(rslt[[rgn]])[[stat]]
            data.frame(rgn=rgn, r2=r2, stringsAsFactors = FALSE)   
        })
    )
}

DO_SAVE_DATA <- FALSE
stdata <- standardize_byrgn(region_hourly, c('load','temperature'))

Linear model

Run linear models by region as a baseline for performance.

rsltlm <- load_model(hourly_data=stdata)
rsltsummary(rsltlm)

We expect an increase in load for both high and low temperatures, so we can retry the linear model with a squared term.

rsltlmsqr <- load_model(hourly_data=stdata, formula=load~temperature + I(temperature^2))
rsltsummary(rsltlmsqr)

Fits using MARS models

rsltearth <- load_model(earth, hourly_data=stdata)
rsltsummary(rsltearth, 'grsq')

Interesting. These compare pretty favorably to the linear model using the squared term. It looks like it's a little higher across the board. We can try adding more terms to see if we get anything new.

rsltearthd2 <- load_model(earth, stdata, formula=load~temperature+I(temperature^2), degree=3)
rsltsummary(rsltearth, 'grsq')

Same result as the above. I would interpret this as saying that the main thing here is allowing the model to increase in both directions, low and high. The $T$^2 term does this, and it also adds some curvature to the relation, but the curvature isn't doing anythng for us because the earth model with just temperature does just as well.

Random forest models

I don't expect this to add a whole lot of value, since in the end we have only the one predictor, but it's worth seeing what we can get with an off the shelf ML algorithm. (Removed because it takes a long time to run and doesn't perform very well.)

##rsltrf <- load_model(randomForest, stdata, ntree=100)
##rsltsummary(rsltrf, 'rsq')

Residual load dataset

The MARS model seems to perform the best, so we will use that one to take out the temperature component of the load.

ep <- as.vector(sapply(rsltearth, predict))
plot(stdata$load, ep, pch='.')
stdata <- dplyr::mutate(stdata, residual_load = load - ep)
hist(stdata$residual_load, breaks=100)
bwidth <- 0.1
fx <- function(x) {nrow(stdata)*bwidth*dnorm(x, mean=0, sd=sd(stdata$residual_load))}
ggplot(data=stdata, aes(x=residual_load)) + geom_histogram(binwidth=bwidth) + stat_function(fun=fx, size=1.25) + theme_bw(16) + xlab('Residual Electricity')

This data is going to be the basis for our modeling, so we will store this as residual_load in the package data.

if(DO_SAVE_DATA) {
    residual_load <- dplyr::select(stdata, EconYear, quarter, week, region, time, residual_load)
    usethis::use_data(residual_load, compress='xz')
}


JGCRI/eiamod documentation built on Feb. 21, 2020, 9:16 a.m.