Description Usage Arguments Value Examples
In a data frame, if there are columns in character or factor format, encodedummy_col can convert them into dummy columns, and each columns has either 1 or 0. Require data.table.
1 2 | dummy_to_cols(data, drop_first_level=FALSE,
keep_origin_cols=FALSE, sep_char='=', inplace=FALSE)
|
data |
Either a data.frame or data.table |
drop_first_level |
Whether drop the first level of category |
keep_origin_cols |
Whether keep the input data's non-numeric columns |
sep_char |
Character string between column names and unique category/level. Example: sep_char="=", and a column named 'Col A' with levels 'A','B'. Then the output columns will be named as 'Col A=A', 'Col A=B'. |
inplace |
Whether modify the data directly. Input data will be converted into data.table, so FALSE means a new copy is created and added new dummy columns, while the input data is not changed. TRUE means the input data is modified directly in memory, so no new object will be created. Details can be found by ?data.table::copy |
Returns a data.table object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | library(data.table)
get_test_data=function(N=1e5,K=6){
set.seed(1)
DT <- data.table(
id1 = factor(sample(letters[1:K], N, TRUE)),
id3 = factor(sample(LETTERS[1:K], N, TRUE)),
id4 = sample(K, N, TRUE),
id5 = factor(sample(K, N, TRUE)),
id6 = sample(N/K, N, TRUE),
v1 = factor(sample(5, N, TRUE)),
v3 = sample(round(runif(100,max=100),4), N, TRUE)
)
}
DT = get_test_data(1e5,6)
DT_ = copy(DT)
print(format(object.size(DT), units = 'Mb'))
# number of unique values in each column
print(DT[,lapply(.SD, uniqueN)])
system.time(
DT_new <- encodedummy_col(DT, drop_first_level=FALSE,
keep_origin_cols=FALSE, sep_char='=',inplace=FALSE)
)
# inplace is FALSE, so the input data is not modified.
# Instead, a new object is created.
identical(DT,DT_) # TRUE
identical(DT,DT_new) # FALSE
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.