View source: R/onehotencoding.R
ohse | R Documentation |
This function lets the user automatically transform a dataframe with categorical columns into numerical by one hot encoding technic.
ohse(
df,
redundant = FALSE,
drop = TRUE,
ignore = NULL,
dates = FALSE,
holidays = FALSE,
country = "Venezuela",
currency_pair = NA,
trim = 0,
limit = 10,
variance = 0.9,
other_label = "OTHER",
sep = "_",
quiet = FALSE,
...
)
df |
Dataframe |
redundant |
Boolean. Should we keep redundant columns? i.e. If the
column only has two different values, should we keep both new columns?
Is set to |
drop |
Boolean. Drop automatically some useless features? |
ignore |
Vector or character. Which column should be ignored? |
dates |
Boolean. Do you want the function to create more features out of the date/time columns? |
holidays |
Boolean. Include holidays as new columns? |
country |
Character or vector. For which countries should the holidays be included? |
currency_pair |
Character. Which currency exchange do you wish to get the history from? i.e, USD/COP, EUR/USD... |
trim |
Integer. Trim names until the nth character |
limit |
Integer. Limit one hot encoding to the n most frequent
values of each column. Set to |
variance |
Numeric. Drop columns with more than n variance. Range: 0-1. For example: if a variable contains 91 unique different values out of 100 observations, this column will be suppressed if value is set to 0.9 |
other_label |
Character. With which text do you wish to replace the filtered values with? |
sep |
Character. Separator's string |
quiet |
Boolean. Quiet all messages and summaries? |
... |
Additional parameters. |
data.frame on which all features are numerical by nature or transformed with one hot encoding.
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
cleanText()
,
date_cuts()
,
date_feats()
,
file_name()
,
formatHTML()
,
holidays()
,
impute()
,
left()
,
normalize()
,
num_abbr()
,
ohe_commas()
,
quants()
,
removenacols()
,
replaceall()
,
replacefactor()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
,
zerovar()
Other Feature Engineering:
date_feats()
,
holidays()
Other One Hot Encoding:
date_feats()
,
holidays()
,
ohe_commas()
data(dft)
dft <- dft[, c(2, 3, 5, 9, 11)]
ohse(dft, limit = 3) %>% head(3)
ohse(dft, limit = 3, redundant = NULL) %>% head(3)
# Getting rid of columns with no (or too much) variance
dft$no_variance1 <- 0
dft$no_variance2 <- c("A", rep("B", nrow(dft) - 1))
dft$no_variance3 <- as.character(rnorm(nrow(dft)))
dft$no_variance4 <- c(rep("A", 20), round(rnorm(nrow(dft) - 20), 4))
ohse(dft, limit = 3) %>% head(3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.