preprocess | R Documentation |
Prepare data for analysis and visualization
preprocess(
x,
completeCases = FALSE,
removeCases.thres = NULL,
removeFeatures.thres = NULL,
missingness = FALSE,
impute = FALSE,
impute.type = c("missRanger", "micePMM", "meanMode"),
impute.missRanger.params = list(pmm.k = 3, maxiter = 10, num.trees = 500),
impute.discrete = get_mode,
impute.numeric = mean,
integer2factor = FALSE,
integer2numeric = FALSE,
logical2factor = FALSE,
logical2numeric = FALSE,
numeric2factor = FALSE,
numeric2factor.levels = NULL,
numeric.cut.n = 0,
numeric.cut.labels = FALSE,
numeric.quant.n = 0,
numeric.quant.NAonly = FALSE,
len2factor = 0,
character2factor = FALSE,
factorNA2missing = FALSE,
factorNA2missing.level = "missing",
factor2integer = FALSE,
factor2integer_startat0 = TRUE,
scale = FALSE,
center = scale,
removeConstants = FALSE,
removeConstants.skipMissing = TRUE,
removeDuplicates = FALSE,
oneHot = FALSE,
add_date_features = FALSE,
date_features = c("weekday", "month", "year"),
add_holidays = FALSE,
exclude = NULL,
xname = NULL,
verbose = TRUE
)
x |
data.frame to be preprocessed |
completeCases |
Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE |
removeCases.thres |
Float (0, 1): Remove cases with >= to this fraction of missing features. |
removeFeatures.thres |
Float (0, 1): Remove features with missing values in >= to this fraction of cases. |
missingness |
Logical: If TRUE, generate new boolean columns for each feature with missing values, indicating which cases were missing data. |
impute |
Logical: If TRUE, impute missing cases. See |
impute.type |
Character: How to impute data: "missRanger" and
"missForest" use the packages of the same name to impute by iterative random
forest regression. "rfImpute" uses |
impute.missRanger.params |
Named list with elements "pmm.k" and
"maxiter", which are passed to |
impute.discrete |
Function that returns single value: How to impute
discrete variables for |
impute.numeric |
Function that returns single value: How to impute
continuous variables for |
integer2factor |
Logical: If TRUE, convert all integers to factors. This includes
|
integer2numeric |
Logical: If TRUE, convert all integers to numeric
(will only work if |
logical2factor |
Logical: If TRUE, convert all logical variables to factors |
logical2numeric |
Logical: If TRUE, convert all logical variables to numeric |
numeric2factor |
Logical: If TRUE, convert all numeric variables to factors |
numeric2factor.levels |
Character vector: Optional - will be passed to
|
numeric.cut.n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric.cut.labels |
Logical: The |
numeric.quant.n |
Integer: If > 0, convert all numeric variables to factors by
binning using |
numeric.quant.NAonly |
Logical: If TRUE, only bin numeric variables with missing values |
len2factor |
Integer (>=2): Convert all variables with less
than or equal to this number of unique values to factors. Default = NULL.
For example, if binary variables are encoded with 1, 2, you could use
|
character2factor |
Logical: If TRUE, convert all character variables to factors |
factorNA2missing |
Logical: If TRUE, make NA values in factors be of
level |
factorNA2missing.level |
Character: Name of level if
|
factor2integer |
Logical: If TRUE, convert all factors to integers |
factor2integer_startat0 |
Logical: If TRUE, start integer coding at 0 |
scale |
Logical: If TRUE, scale columns of |
center |
Logical: If TRUE, center columns of |
removeConstants |
Logical: If TRUE, remove constant columns. |
removeConstants.skipMissing |
Logical: If TRUE, skip missing values, before checking if feature is constant |
removeDuplicates |
Logical: If TRUE, remove duplicate cases. |
oneHot |
Logical: If TRUE, convert all factors using one-hot encoding. |
add_date_features |
Logical: If TRUE, extract date features from date columns. |
date_features |
Character vector: Features to extract from dates. |
add_holidays |
Logical: If TRUE, extract holidays from date columns. |
exclude |
Integer, vector: Exclude these columns from preprocessing. |
xname |
Character: Name of |
verbose |
Logical: If TRUE, write messages to console. |
Order of operations (reflected by order of arguments in usage):
keep complete cases only
remove constants
remove duplicates
remove cases by missingness threshold
remove features by missingness threshold
integer to factor
integer to numeric
logical to factor
logical to numeric
numeric to factor
cut numeric to n bins
cut numeric to n quantiles
numeric with less than N unique values to factor
character to factor
factor NA to named level
add missingness column
impute
scale and/or center
one-hot encoding
E.D. Gennatas
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.