library(dplyr)
By default, model.matrix()
generates binary indicator variables for factor predictors. When the formula does not remove an intercept, an incomplete set of indicators are created; no indicator is made for the first level of the factor.
For example, species
and island
both have three levels but model.matrix()
creates two indicator variables for each:
library(dplyr) library(modeldata) data(penguins) levels(penguins$species) levels(penguins$island) model.matrix(~ species + island, data = penguins) %>% colnames()
For a formula with no intercept, the first factor is expanded to indicators for all factor levels but all other factors are expanded to all but one (as above):
model.matrix(~ 0 + species + island, data = penguins) %>% colnames()
For inference, this hybrid encoding can be problematic.
To generate all indicators, use this contrast:
# Switch out the contrast method old_contr <- options("contrasts")$contrasts new_contr <- old_contr new_contr["unordered"] <- "contr_one_hot" options(contrasts = new_contr) model.matrix(~ species + island, data = penguins) %>% colnames() options(contrasts = old_contr)
Removing the intercept here does not affect the factor encodings.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.