oneHot: One-hot Encoder And Decoder Of Variables

View source: R/oneHotEncoder.R

oneHotR Documentation

One-hot Encoder And Decoder Of Variables

Description

Encodes logical, categorical, integer and double type variables.

Usage

oneHot(x, type, omc = "dgCMatrix", verbose = TRUE)

Arguments

x

a (named) vector or list for encoding. Missing data are removed. For decoding, a dense or sparse matrix (preferably, the result of encoding) representing a single source data column

type

symbol. Choices: encode - one-hot encoding, decode - revert to original

omc

character length 1. Output matrix class. Default, 'dgCMatrix', other option, 'matrix'

verbose

logical, default TRUE, display messages

Details

This utility one-hot encodes when type = encode and verifies the encoded result (or any matrix of encodings obtained with getEV extractor) when type = decode. It detects illicit states.

Value

Encoding returns a matrix of length(x) rows and length(unique(x)) columns or a warning. Decoding returns a (named) vector or a warning. List vectors are returned unlisted. Integer(ish) vectors, converted to integer, character vectors - to factor, double or logical vector types remain unchanged.

Examples


if (interactive()) {

# 1. Encode type "double"

x = runif(9)                           # numeric, length 9
names(x) = letters[1:9]                # named
typeof(x)
a = oneHot(x, encode)                  # a sparse matrix of "dgCMatrix" class
b = oneHot(a, decode)                  # a type "double" named numeric, length 9
isTRUE(all.equal(x, b))                # TRUE
typeof(b)
print(x); print(b)

# 2. Type "logical" with missing values

y = c(TRUE, TRUE, NA, FALSE, TRUE, NA) # logical, length 6 with missing values
typeof(y)
a = oneHot(y, encode, 'matrix')
print(a)                               # a dense matrix
b = oneHot(a, decode)                  # revert
all.equal(y, b)                        # missing values in y removed
typeof(b)
print(x); print(b)

# 3. iris data

data(iris)
a = lapply(iris, oneHot, encode)       # encode entire data
b = as.data.frame(
         lapply(a, oneHot, decode)     # revert
     )
identical(iris, b)                     # TRUE. Now, replace iris data with
                                       # mtcars data!

# 4. Illicit states in one-hot encoding

`3.41` = c(1,0,0,1,1,0,0,1)            # encoded type "double"
`0.12` = c(0,1,0,0,0,1,1,0)
 a = cbind(`3.41`, `0.12`)             # form encoded matrix
 print(a)                              # matrix resembling one-hot encoding
 x = oneHot(a, decode)                 # illicit state detected
 print(x)                              # list with 2 different data types

}


akin documentation built on May 19, 2026, 5:07 p.m.