Home

/

GitHub

/

In Zelazny7/onehot: Fast Onehot Encoding for Data.frames

options(width=250)

Onehot package

Installation

devtools::install_github("https://github.com/Zelazny7/onehot")

Usage

set.seed(100)
test <- data.frame(
  factor    = factor(sample(c(NA, letters[1:3]), 100, T)),
  integer   = as.integer(runif(100) * 10),
  real      = rnorm(100),
  logical   = sample(c(T, F), 100, T),
  character = sample(letters, 100, T),
  stringsAsFactors = FALSE)

head(test)

Create a onehot object

A onehot object contains information about the data.frame. This is used to transform a data.frame into a onehot encoded matrix. It should be saved to transform future datasets into the same exact layout.

library(onehot)
encoder <- onehot(test)

## printe a summary
encoder

Transforming data.frames

The onehot object has a predict method which may be used to transform a data.frame. Factors are onehot encoded. Character variables are skipped. However calling predict with stringsAsFactors=TRUE will convert character vectors to factors first.

train_data <- predict(encoder, test)
head(train_data)

NA indicator columns

add_NA_factors=TRUE (the default) will create an indicator column for every factor column. Having NAs as a factor level will result in an indicator column being created without using this option.

encoder <- onehot(test, add_NA_factors=TRUE)
train_data <- predict(encoder, test)
head(train_data)

Sentinel values for numeric columns

The sentinel=VALUE argument will replace all numeric NAs with the provided value. Some ML algorithms such as randomForest and xgboost do not handle NA values. However, by using sentinel values such algorithms are usually able to separate them with enough decision-tree splits. The default value is -999

Sparse Matrices

onehot also provides support for predicting sparse, column compressed matrices from the Matrix package:

encoder <- onehot(test)
train_data <- predict(encoder, test, sparse=TRUE)
head(train_data)

Zelazny7/onehot documentation built on May 6, 2019, 1:30 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Zelazny7/onehot
Fast Onehot Encoding for Data.frames

In Zelazny7/onehot: Fast Onehot Encoding for Data.frames

Onehot package

Installation

Usage

Create a onehot object

Transforming data.frames

NA indicator columns

Sentinel values for numeric columns

Sparse Matrices

R Package Documentation

Browse R Packages

We want your feedback!

Zelazny7/onehot Fast Onehot Encoding for Data.frames

In Zelazny7/onehot: Fast Onehot Encoding for Data.frames

Onehot package

Installation

Usage

Create a onehot object

Transforming data.frames

NA indicator columns

Sentinel values for numeric columns

Sparse Matrices

R Package Documentation

Browse R Packages

We want your feedback!

Zelazny7/onehot
Fast Onehot Encoding for Data.frames