preprocess_ | R Documentation |
Prepare data for analysis and visualization
preprocess_(
x,
removeFeatures.thres = NULL,
missingness = FALSE,
integer2factor = FALSE,
integer2numeric = FALSE,
logical2factor = FALSE,
logical2numeric = FALSE,
numeric2factor = FALSE,
numeric2factor.levels = NULL,
len2factor = 0,
character2factor = FALSE,
factorNA2missing = FALSE,
factorNA2missing.level = "missing",
scale = FALSE,
center = scale,
removeConstants = FALSE,
oneHot = FALSE,
exclude = NULL,
verbose = TRUE
)
x |
data.frame or data.table to be preprocessed. If data.frame, will be converted to data.table in-place of missing features. |
removeFeatures.thres |
Float (0, 1): Remove features with missing values in >= to this fraction of cases. |
missingness |
Logical: If TRUE, generate new boolean columns for each feature with missing values, indicating which cases were missing data. |
integer2factor |
Logical: If TRUE, convert all integers to factors |
integer2numeric |
Logical: If TRUE, convert all integers to numeric
(will only work if |
logical2factor |
Logical: If TRUE, convert all logical variables to factors |
logical2numeric |
Logical: If TRUE, convert all logical variables to numeric |
numeric2factor |
Logical: If TRUE, convert all numeric variables to factors |
numeric2factor.levels |
Character vector: Optional - If |
len2factor |
Integer (>=2): Convert all numeric variables with less
than or equal to this number of unique values to factors.
For example, if binary variables are encoded with 1, 2, you could use
|
character2factor |
Logical: If TRUE, convert all character variables to factors |
factorNA2missing |
Logical: If TRUE, make NA values in factors be of
level |
factorNA2missing.level |
Character: Name of level if
|
scale |
Logical: If TRUE, scale columns of |
center |
Logical: If TRUE, center columns of |
removeConstants |
Logical: If TRUE, remove constant columns. |
oneHot |
Logical: If TRUE, convert all factors using one-hot encoding |
exclude |
Integer, vector: Exclude these columns from preprocessing. |
verbose |
Logical: If TRUE, write messages to console. |
This function (ending in "_") performs operations in-place and returns the preprocessed data.table silently (e.g. for piping). Note that imputation is not currently supported - use preprocess for imputation.
Order of operations is the same as the order of arguments in usage:
keep complete cases only
remove duplicates
remove cases by missingness threshold
remove features by missingness threshold
integer to factor
integer to numeric
logical to factor
logical to numeric
numeric to factor
numeric with less than N unique values to factor
character to factor
factor NA to named level
add missingness column
scale and/or center
remove constants
one-hot encoding
E.D. Gennatas
## Not run:
x <- data.table(a = sample(c(1:3), 30, T),
b = rnorm(30, 12),
c = rnorm(30, 200),
d = sample(c(21:22), 30, T),
e = rnorm(30, -100),
f = rnorm(30, 950),
g = rnorm(30),
h = rnorm(30))
## add duplicates
x <- rbind(x, x[c(1, 3), ])
## add constant
x[, z := 99]
preprocess_(x)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.