parsnip: A Common API to Modeling and Analysis Functions

By default, model.matrix() generates binary indicator variables for factor predictors. When the formula does not remove an intercept, an incomplete set of indicators are created; no indicator is made for the first level of the factor.

For example, species and island both have three levels but model.matrix() creates two indicator variables for each:

library(dplyr)
library(modeldata)
data(penguins)

levels(penguins$species)

## [1] "Adelie"    "Chinstrap" "Gentoo"

levels(penguins$island)

## [1] "Biscoe"    "Dream"     "Torgersen"

model.matrix(~ species + island, data = penguins) %>% 
  colnames()

## [1] "(Intercept)"      "speciesChinstrap" "speciesGentoo"    "islandDream"     
## [5] "islandTorgersen"

For a formula with no intercept, the first factor is expanded to indicators for all factor levels but all other factors are expanded to all but one (as above):

model.matrix(~ 0 + species + island, data = penguins) %>% 
  colnames()

## [1] "speciesAdelie"    "speciesChinstrap" "speciesGentoo"    "islandDream"     
## [5] "islandTorgersen"

For inference, this hybrid encoding can be problematic.

To generate all indicators, use this contrast:

# Switch out the contrast method
old_contr <- options("contrasts")$contrasts
new_contr <- old_contr
new_contr["unordered"] <- "contr_one_hot"
options(contrasts = new_contr)

model.matrix(~ species + island, data = penguins) %>% 
  colnames()

## [1] "(Intercept)"      "speciesAdelie"    "speciesChinstrap" "speciesGentoo"   
## [5] "islandBiscoe"     "islandDream"      "islandTorgersen"

options(contrasts = old_contr)

Removing the intercept here does not affect the factor encodings.

topepo/parsnip documentation built on April 16, 2024, 3:23 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

topepo/parsnip
A Common API to Modeling and Analysis Functions

man/rmd/one-hot.md
In topepo/parsnip: A Common API to Modeling and Analysis Functions

R Package Documentation

Browse R Packages

We want your feedback!

topepo/parsnip A Common API to Modeling and Analysis Functions

man/rmd/one-hot.md In topepo/parsnip: A Common API to Modeling and Analysis Functions

R Package Documentation

Browse R Packages

We want your feedback!

topepo/parsnip
A Common API to Modeling and Analysis Functions

man/rmd/one-hot.md
In topepo/parsnip: A Common API to Modeling and Analysis Functions