README.md

parsnip a drawing of a parsnip on a beige background

R-CMD-check Codecov test
coverage CRAN
status Downloads lifecycle

Introduction

The goal of parsnip is to provide a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages.

Installation

# The easiest way to get parsnip is to install all of tidymodels:
install.packages("tidymodels")

# Alternatively, install just parsnip:
install.packages("parsnip")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidymodels/parsnip")

Getting started

One challenge with different modeling functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest regression model, we might have:

# From randomForest
rf_1 <- randomForest(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  ntree = 2000, 
  importance = TRUE
)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  num.trees = 2000, 
  importance = "impurity"
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 10,
  num.trees = 2000
)

Note that the model syntax can be very different and that the argument names (and formats) are also different. This is a pain if you switch between implementations.

In this example:

The goals of parsnip are to:

Using the example above, the parsnip approach would be:

library(parsnip)

rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("regression")
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 10
#>   trees = 2000
#> 
#> Engine-Specific Arguments:
#>   importance = impurity
#> 
#> Computational engine: ranger

The engine can be easily changed. To use Spark, the change is straightforward:

rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("spark") %>%
  set_mode("regression")
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 10
#>   trees = 2000
#> 
#> Computational engine: spark

Either one of these model specifications can be fit in the same way:

set.seed(192)
rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("regression") %>%
  fit(mpg ~ ., data = mtcars)
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~10,      x), num.trees = ~2000, importance = ~"impurity", num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  2000 
#> Sample size:                      32 
#> Number of independent variables:  10 
#> Mtry:                             10 
#> Target node size:                 5 
#> Variable importance mode:         impurity 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       5.976917 
#> R squared (OOB):                  0.8354559

A list of all parsnip models across different CRAN packages can be found at https://www.tidymodels.org/find/parsnip.

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.



Try the parsnip package in your browser

Any scripts or data that you put into this service are public.

parsnip documentation built on Aug. 18, 2023, 1:07 a.m.