clean.dataset: Data Cleaning

Description Usage Arguments Value Author(s)

View source: R/clean_dataset.R

Description

A function for scrubbing a datasetset for usage with most standard algorithms. This involves one-hot-encoding columns that are probably categorical.

Usage

1
clean.dataset(dataset, clean.invalid = TRUE, clean.ohe = FALSE)

Arguments

dataset

a list with at least the following key-worded elements:

  • X[n, d] matrix containing n samples in d dimensions.

  • Y[n, r] matrix containing or [n] vector containing regressors or class labels forsamples in X.

clean.invalid

whether to remove samples with invalid entries. Defaults to TRUE.

  • TRUE Remove samples that have features with NaN entries or non-finite.

  • FALSE Do not remove samples that have features with NaN entries or are non-finite..

clean.ohe

options for whether to one-hot-encode columns. Defaults to FALSE.

  • clean.ohe < 1Converts columns with < thr*n unique identifiers to one-hot encoded.

  • is.integer(clean.ohe)Converts columns with < thr unique identifiers to one-hot encoded.

  • FALSEDo not one-hot-encode any columns.

Value

A list containing at least the following key-worded elements:

Author(s)

Eric Bridgeford


neurodata/slbR documentation built on May 22, 2019, 2:41 p.m.