one_hot_encode: one_hot_encode

Description Usage Arguments Value Examples

View source: R/one_hot_encode.R

Description

Use one hot encoding to handle meaningless or transposed columns

Usage

1
2
one_hot_encode(df, encode_cols = NULL, keep = "exists",
  min_occurences = 1)

Arguments

df

A dataframe that we want to one hot encode

encode_cols

The names of the columns that should be encoded, default is all columns

keep

"exists" for 1 if value exists, 0 else. "sum" for sum of number of appearances in a row, default exists

min_occurences

minimum number of appearances in the data before column is added, default 1.

Value

dataframe with encode_cols removed and replaced with numeric columns

Examples

1
2
3
4
5
(mat <- matrix(letters[sample(1:26,20,replace = T)],5,4))
one_hot_encode(mat)
one_hot_encode(mat, min_occurences = 2) # keeps only values that appear in 2 or more rows
one_hot_encode(mat, keep = "sum") # stores number of times each value appears in a given row
one_hot_encode(mat, encode_cols = c(2,3)) # encode only certain columns and leave the rest of them  in place

jveech/recolumnize documentation built on Dec. 11, 2019, 2:05 a.m.