Description Usage Arguments Details Value Note Examples
Data dummification is also known as one hot encoding or feature binarization. It turns each category to a distinct column with binary (numeric) values.
1 |
data |
input data |
maxcat |
maximum categories allowed for each discrete feature. Default is 50. |
select |
names of selected features to be dummified. Default is |
Continuous features will be ignored if added in select
.
select
features will be ignored if categories exceed maxcat
.
dummified dataset (discrete features only) preserving original features. However, column order might be different.
This is different from model.matrix, where the latter aims to create a full rank matrix for regression-like use cases. If your intention is to create a design matrix, use model.matrix instead.
1 2 3 4 5 6 7 |
'data.frame': 150 obs. of 7 variables:
$ Sepal.Length : num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length : num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species_setosa : int 1 1 1 1 1 1 1 1 1 1 ...
$ Species_versicolor: int 0 0 0 0 0 0 0 0 0 0 ...
$ Species_virginica : int 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, ".internal.selfref")=<externalptr>
2 features with more than 5 categories ignored!
color: 7 categories
clarity: 8 categories
Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 14 variables:
$ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : num 55 61 65 58 58 57 57 55 61 61 ...
$ price : int 326 326 327 334 335 336 336 337 337 338 ...
$ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
$ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
$ clarity : Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
$ cut_Fair : int 0 0 0 0 0 0 0 0 1 0 ...
$ cut_Good : int 0 0 1 0 1 0 0 0 0 0 ...
$ cut_Ideal : int 1 0 0 0 0 0 0 0 0 0 ...
$ cut_Premium : int 0 1 0 1 0 0 0 0 0 0 ...
$ cut_Very.Good: int 0 0 0 0 0 1 1 1 0 1 ...
- attr(*, ".internal.selfref")=<externalptr>
Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 20 variables:
$ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : num 55 61 65 58 58 57 57 55 61 61 ...
$ price : int 326 326 327 334 335 336 336 337 337 338 ...
$ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
$ clarity : Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
$ cut_Fair : int 0 0 0 0 0 0 0 0 1 0 ...
$ cut_Good : int 0 0 1 0 1 0 0 0 0 0 ...
$ cut_Ideal : int 1 0 0 0 0 0 0 0 0 0 ...
$ cut_Premium : int 0 1 0 1 0 0 0 0 0 0 ...
$ cut_Very.Good: int 0 0 0 0 0 1 1 1 0 1 ...
$ color_D : int 0 0 0 0 0 0 0 0 0 0 ...
$ color_E : int 1 1 1 0 0 0 0 0 1 0 ...
$ color_F : int 0 0 0 0 0 0 0 0 0 0 ...
$ color_G : int 0 0 0 0 0 0 0 0 0 0 ...
$ color_H : int 0 0 0 0 0 0 0 1 0 1 ...
$ color_I : int 0 0 0 1 0 0 1 0 0 0 ...
$ color_J : int 0 0 0 0 1 1 0 0 0 0 ...
- attr(*, ".internal.selfref")=<externalptr>
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.