README.md
In holub008/gdlm: Build linear models through gradient descent

gdlm

Gradient Descent for Linear Models in R

Perform gradient descent to build linear models and use bootstrapping to generate standard errors on estimators. Goal is to provide flexibility in loss function specification combined with general ease of use associated with other linear models in R.

devtools::install_git('https://github.com/holub008/gdlm')

Here we fit an OLS model predicting city mpg from highway mpg on the mpg dataset from ggplot:

data(mpg)
(p1 <- ggplot(mpg) + geom_point(aes(hwy, cty)))

m <- gdlm(cty ~ hwy, mpg, loss = LS_LOSS())
summary(m)
(p2 <- p1 + geom_abline(aes(intercept = m$estimators[1],
                            slope = m$estimators[2]), color = 'black'))

with results:

Formula: cty ~ hwy 

Estimators:
             Estimate Standard Error        5%       95%
(Intercept) 0.8416769     0.33131266 0.2702470 1.3416143
hwy         0.6832955     0.01508875 0.6613536 0.7078258

mpg_ls_fit

If we have a prediction problem where overpredictions are more heavily penalized than underpredictions, we may parameterize our loss function as:

m_negative_penalty <- gdlm(cty ~ hwy, mpg, loss = LS_LOSS(loss_asymmetry = .1))

p2 + geom_abline(aes(intercept = m_negative_penalty$estimators[1],
                     slope = m_negative_penalty$estimators[2]), color ='red')

mpg_underpred_fit

In practice the loss asymmetry would likely be determined by business . Loss asymmetry can similarly be applied to LAD/logistic loss functions:

m_lad <- gdlm(cty ~ hwy, mpg, loss = LAD_LOSS())
m_lad_negative_penalty <- gdlm(cty ~ hwy, mpg, loss = LAD_LOSS(loss_asymmetry = .1))
p1 +
  geom_abline(aes(intercept = m_lad$estimators[1], slope = m_lad$estimators[2]), color = 'blue') +
  geom_abline(aes(intercept = m_lad_negative_penalty$estimators[1],
                     slope = m_lad_negative_penalty$estimators[2]), color ='red') + ggtitle('Least Absolute Deviation Fits')

mpg_lad_fits

Note that the LAD case is quite similar to a quantile regression.

Ridge regression is a convex problem, so we may use gradient descent to find the unique optimal parameterization. LASSO/elastic net regularization are configurable, but do not necessarily lead to a globally optimal parameterization.

car_data <- mpg
car_data$is_big_car <- mpg$class %in% c('suv', 'pickup', 'minivan')

m_log <- gdlm(is_big_car ~ hwy, car_data, loss = LOGISTIC_LOSS())
summary(m_log)

log_odds <- predict(m_log)
ggplot() + geom_histogram(aes(log_odds, fill = car_data$is_big_car))

m_log_shrunk <- gdlm(is_big_car ~ hwy, car_data,
                     loss = compose_regularization(LOGISTIC_LOSS(), elastic_net_parameter = 0, lambda = 1e-3))
summary(m_log_shrunk)

log_odds <- predict(m_log_shrunk)
ggplot() + geom_histogram(aes(log_odds, fill = car_data$is_big_car))

If computing bootstrapped SEs with many trials, one may benefit by increasing the level of parallelism:

options(mc.cores = 1)
system.time(gdlm(Sepal.Width ~ Species * Petal.Width + Petal.Length, data = iris, loss = LS_LOSS(), 
                 bootstrap_trials = 1e4))
     user  system elapsed 
  145.614   6.377 152.595 

options(mc.cores = 4)
system.time(gdlm(Sepal.Width ~ Species * Petal.Width + Petal.Length, data = iris, loss = LS_LOSS(), 
                 bootstrap_trials = 1e4))
     user  system elapsed 
   92.386   4.420  47.913

Note this will only work on Unix variants.

holub008/gdlm documentation built on May 13, 2019, 6:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com