dummify | R Documentation |
Data dummification is also known as one hot encoding or feature binarization. It turns each category to a distinct column with binary (numeric) values.
dummify(data, maxcat = 50L, select = NULL)
data |
input data |
maxcat |
maximum categories allowed for each discrete feature. Default is 50. |
select |
names of selected features to be dummified. Default is |
Continuous features will be ignored if added in select
.
select
features will be ignored if categories exceed maxcat
.
dummified dataset (discrete features only) preserving original features. However, column order might be different.
This is different from model.matrix, where the latter aims to create a full rank matrix for regression-like use cases. If your intention is to create a design matrix, use model.matrix instead.
## Dummify iris dataset
str(dummify(iris))
## Dummify diamonds dataset ignoring features with more than 5 categories
data("diamonds", package = "ggplot2")
str(dummify(diamonds, maxcat = 5))
str(dummify(diamonds, select = c("cut", "color")))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.