knitr::opts_chunk$set(echo = TRUE)

ploom

CRAN_Status_Badge Build Status Coverage status lifecycle

Overview

ploom provides tools for memory efficient fitting of Linear and Generalized Linear models. Inspired by biglm, ploom fits models using a bounded memory algorithm that enables:

ploom models are

Installation

# development version from GitHub:
# install.packages("devtools")
devtools::install_github("blakeboswell/ploom")

Usage

Models are intialized with a formula; fit to data with calls to fit(); and summarized with standard functions such as tidy(), glance(), and summary().

library(ploom)

y <- oomlm(mpg ~ wt + qsec + factor(am))
y <- fit(y, data = mtcars)

tidy(y)

Bounded Memory

Models can be be fit with repeated calls to fit() over chunks of data. Each call to fit() only needs to allocate memory for the provided chunk, thereby bounding the required memory.

y <- oomlm(mpg ~ wt + qsec + factor(am))
y <- fit(y, mtcars[1:16, ])
y <- fit(y, mtcars[17:32, ])

coef(y)

Fitting over Chunks

The function oomdata_tbl() enables iteration over an in-memory tibble or data.frame. When an oomdata_tbl() is provided as the data argument to fit(), all chunks are automatically iterated over.

chunks <- oomdata_tbl(mtcars, chunk_size = 16)
fit(oomlm(mpg ~ wt + qsec + factor(am)), chunks)

Working with Databases

The function oomdata_dbi() enables iteratation over a DBI result set. fit() will automatically fit the model over all chunks.

# connect to database
con    <- DBI::dbConnect(RPostgres::Postgres(), dbname="mtcars")
result <- DBI::dbSendQuery(con, "select mpg, wt, qsec, am from mtcars;")
chunks <- oomdata_dbi(result, chunk_size = 16)

# fit model to all chunks
y <- fit(oomlm(mpg ~ wt + qsec + factor(am)), chunks)

# inspect fit statistiscs
glance(y)

See the articles NA and NA for more on interfacing with databases.

Prediction & Residuals

Prediction with ploom models is performed with the predict() function. predict() provides options for confidence intervals, prediction intervals, and standard error in addition to fit.

Because ploom models do not store any data while fitting, we must also provide data.

predict(y, new_data = mtcars, std_error = TRUE, interval = "prediction")

Similarly, residuals are accessible on demand with residuals():

sum(residuals(y, data = mtcars)^2)

Alternatives

Acknowledgements

Thanks to:



blakeboswell/yotta documentation built on May 30, 2019, 11:43 a.m.