Description Usage Arguments Value Author(s)
View source: R/clean_dataset.R
A function for scrubbing a datasetset for usage with most standard algorithms. This involves one-hot-encoding columns that are probably categorical.
1 | clean.dataset(dataset, clean.invalid = TRUE, clean.ohe = FALSE)
|
dataset |
a list with at least the following key-worded elements:
|
clean.invalid |
whether to remove samples with invalid entries. Defaults to
|
clean.ohe |
options for whether to one-hot-encode columns. Defaults to
|
A list containing at least the following key-worded elements:
X[m, d+r]
the array with m
samples in d+r
dimensions, where r
is the number of additional columns appended for encodings. m < n
when there are non-finite or NaN
entries. colnames(dataset)
returns the column names of the cleaned columns.
Y[m, r]
matrix or [n]
vector containg regressors or class labels for samples in X
. m < n
when there are non-finite or NaN
entries.
samplesm
the sample ids that are included in the final array, where samp[i]
is the original row id corresponding to Xc[i,]
. If m < n
, there were non-finite or NaN
entries that were purged.
Eric Bridgeford
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.