View source: R/oneHotEncoder.R
| oneHot | R Documentation |
Encodes logical, categorical, integer and double type variables.
oneHot(x, type, omc = "dgCMatrix", verbose = TRUE)
x |
a (named) vector or list for encoding. Missing data are removed. For decoding, a dense or sparse matrix (preferably, the result of encoding) representing a single source data column |
type |
symbol. Choices: encode - one-hot encoding, decode - revert to original |
omc |
character length 1. Output matrix class. Default, 'dgCMatrix', other option, 'matrix' |
verbose |
logical, default TRUE, display messages |
This utility one-hot encodes when type = encode and verifies the encoded result (or any
matrix of encodings obtained with getEV extractor) when type = decode. It detects illicit states.
Encoding returns a matrix of length(x) rows and length(unique(x)) columns or a warning.
Decoding returns a (named) vector or a warning. List vectors are returned unlisted. Integer(ish) vectors,
converted to integer, character vectors - to factor, double or logical vector types remain unchanged.
if (interactive()) {
# 1. Encode type "double"
x = runif(9) # numeric, length 9
names(x) = letters[1:9] # named
typeof(x)
a = oneHot(x, encode) # a sparse matrix of "dgCMatrix" class
b = oneHot(a, decode) # a type "double" named numeric, length 9
isTRUE(all.equal(x, b)) # TRUE
typeof(b)
print(x); print(b)
# 2. Type "logical" with missing values
y = c(TRUE, TRUE, NA, FALSE, TRUE, NA) # logical, length 6 with missing values
typeof(y)
a = oneHot(y, encode, 'matrix')
print(a) # a dense matrix
b = oneHot(a, decode) # revert
all.equal(y, b) # missing values in y removed
typeof(b)
print(x); print(b)
# 3. iris data
data(iris)
a = lapply(iris, oneHot, encode) # encode entire data
b = as.data.frame(
lapply(a, oneHot, decode) # revert
)
identical(iris, b) # TRUE. Now, replace iris data with
# mtcars data!
# 4. Illicit states in one-hot encoding
`3.41` = c(1,0,0,1,1,0,0,1) # encoded type "double"
`0.12` = c(0,1,0,0,0,1,1,0)
a = cbind(`3.41`, `0.12`) # form encoded matrix
print(a) # matrix resembling one-hot encoding
x = oneHot(a, decode) # illicit state detected
print(x) # list with 2 different data types
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.