sparsify: Sparsify

Description Usage Arguments Details Examples

View source: R/sparsify.R

Description

Convert a data.table object into a sparse matrix (with the same number of rows).

Usage

1
2
3
4
5
6
7
sparsify(
  dt,
  sparsifyNAs = FALSE,
  naCols = "none",
  sparsifyCols = NULL,
  memEfficient = FALSE
)

Arguments

dt

A data.table object

sparsifyNAs

Should NAs be converted to 0s and sparsified?

naCols
  • "none" Don't generate columns to identify NA values

  • "identify" For each column of dt with an NA value, generate a column in the sparse matrix with 1s indicating NAs. Columns will be named like "color_NA"

  • "efficient" For each column of dt with an NA value, generate a column in the sparse matrix with 1s indicating either NAs or Non NAs - whichever is more memory efficient. Columns will be named like "color_NA" or "color_NotNA"

sparsifyCols

What columns to use. Use this to exclude columns of dt from being sparsified without having to build a column-subsetted copy of dt to input into sparsify(...). Default = NULL means use all columns of dt.

memEfficient

Default = FALSE. Set this to TRUE for a slower but more memory efficient process

Details

Converts a data.table object to a sparse matrix (class "dgCMatrix"). Requires the Matrix package. All sparsified data is assumed to take on the value 0/FALSE

### Data Type | Description & NA handling

numeric | If sparsifyNAs = FALSE, only 0s will be sparsified If sparsifyNAs = TRUE, 0s and NAs will be sparsified

factor (unordered) | Each level will generate a sparsified binary column Column names are feature_level, e.g. "color_red", "color_blue"

factor (ordered) | Levels are converted to numeric, 1 - NLevels If sparsifyNAs = FALSE, NAs will remain as NAs If sparsifyNAs = TRUE, NAs will be sparsified

logical | TRUE and FALSE values will be converted to 1s and 0s If sparsifyNAs = FALSE, only FALSEs will be sparsified If sparsifyNAs = TRUE, FALSEs and NAs will be sparsified

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library(data.table)
library(Matrix)

dt <- data.table(
  intCol=c(1L, NA_integer_, 3L, 0L),
  realCol=c(NA, 2, NA, NA),
  logCol=c(TRUE, FALSE, TRUE, FALSE),
  ofCol=factor(c("a", "b", NA, "b"), levels=c("a", "b", "c"), ordered=TRUE),
  ufCol=factor(c("a", NA, "c", "b"), ordered=FALSE)
)

sparsify(dt)
sparsify(dt, sparsifyNAs=TRUE)
sparsify(dt[, list(realCol)], naCols="identify")
sparsify(dt[, list(realCol)], naCols="efficient")

ben519/mltools documentation built on Sept. 22, 2021, 4:30 p.m.