knitr::opts_chunk$set(echo = TRUE)
ploom provides tools for memory efficient fitting of Linear and Generalized Linear models. Inspired by biglm, ploom fits models using a bounded memory algorithm that enables:
lm()
and glm()
ploom models are
tidy()
, glance()
, augment()
and many stats
functions such as predict()
and residuals()
# development version from GitHub: # install.packages("devtools") devtools::install_github("blakeboswell/ploom")
Models are intialized with a formula
; fit to data with calls to fit()
; and summarized with standard functions such as tidy()
, glance()
, and summary()
.
library(ploom) y <- oomlm(mpg ~ wt + qsec + factor(am)) y <- fit(y, data = mtcars) tidy(y)
Models can be be fit with repeated calls to fit()
over chunks of data. Each call to fit()
only needs to allocate memory for the provided chunk, thereby bounding the required memory.
y <- oomlm(mpg ~ wt + qsec + factor(am)) y <- fit(y, mtcars[1:16, ]) y <- fit(y, mtcars[17:32, ]) coef(y)
The function oomdata_tbl()
enables iteration over an in-memory tibble
or data.frame
. When an oomdata_tbl()
is provided as the data argument to fit()
, all chunks are automatically iterated over.
chunks <- oomdata_tbl(mtcars, chunk_size = 16) fit(oomlm(mpg ~ wt + qsec + factor(am)), chunks)
The function oomdata_dbi()
enables iteratation over a DBI
result set. fit()
will automatically fit the model over all chunks.
# connect to database con <- DBI::dbConnect(RPostgres::Postgres(), dbname="mtcars") result <- DBI::dbSendQuery(con, "select mpg, wt, qsec, am from mtcars;") chunks <- oomdata_dbi(result, chunk_size = 16) # fit model to all chunks y <- fit(oomlm(mpg ~ wt + qsec + factor(am)), chunks) # inspect fit statistiscs glance(y)
See the articles NA and NA for more on interfacing with databases.
Prediction with ploom models is performed with the predict()
function. predict()
provides options for confidence intervals, prediction intervals, and standard error in addition to fit.
Because ploom models do not store any data while fitting, we must also provide data.
predict(y, new_data = mtcars, std_error = TRUE, interval = "prediction")
Similarly, residuals are accessible on demand with residuals()
:
sum(residuals(y, data = mtcars)^2)
Thanks to:
biglm
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.