knitr::opts_chunk$set(collapse = TRUE, comment = "#>") set.seed(1014)
The modelr package provides functions that help you create elegant pipelines when modelling. It was designed primarily to support teaching the basics of modelling for the 1st edition of R for Data Science.
We no longer recommend it and instead suggest https://www.tidymodels.org/ for a more comprehensive framework for modelling within the tidyverse.
# The easiest way to get modelr is to install the whole tidyverse: install.packages("tidyverse") # Alternatively, install just modelr: install.packages("modelr")
library(modelr)
The resample
class stores a "reference" to the original dataset and a vector of row indices. A resample can be turned into a dataframe by calling as.data.frame()
. The indices can be extracted using as.integer()
:
# a subsample of the first ten rows in the data frame rs <- resample(mtcars, 1:10) as.data.frame(rs) as.integer(rs)
The class can be utilized in generating an exclusive partitioning of a data frame:
# generate a 30% testing partition and a 70% training partition ex <- resample_partition(mtcars, c(test = 0.3, train = 0.7)) lapply(ex, dim)
modelr offers several resampling methods that result in a list of resample
objects (organized in a data frame):
# bootstrap boot <- bootstrap(mtcars, 100) # k-fold cross-validation cv1 <- crossv_kfold(mtcars, 5) # Monte Carlo cross-validation cv2 <- crossv_mc(mtcars, 100) dim(boot$strap[[1]]) dim(cv1$train[[1]]) dim(cv1$test[[1]]) dim(cv2$train[[1]]) dim(cv2$test[[1]])
modelr includes several often-used model quality metrics:
mod <- lm(mpg ~ wt, data = mtcars) rmse(mod, mtcars) rsquare(mod, mtcars) mae(mod, mtcars) qae(mod, mtcars)
A set of functions let you seamlessly add predictions and residuals as additional columns to an existing data frame:
set.seed(1014) df <- tibble::tibble( x = sort(runif(100)), y = 5 * x + 0.5 * x ^ 2 + 3 + rnorm(length(x)) ) mod <- lm(y ~ x, data = df) df %>% add_predictions(mod) df %>% add_residuals(mod)
For visualization purposes it is often useful to use an evenly spaced grid of points from the data:
data_grid(mtcars, wt = seq_range(wt, 10), cyl, vs) # For continuous variables, seq_range is useful mtcars_mod <- lm(mpg ~ wt + cyl + vs, data = mtcars) data_grid(mtcars, wt = seq_range(wt, 10), cyl, vs) %>% add_predictions(mtcars_mod)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.