preprocess_data | R Documentation |
Function to preprocess your data for input into run_ml()
.
preprocess_data(
dataset,
outcome_colname,
method = c("center", "scale"),
remove_var = "nzv",
collapse_corr_feats = TRUE,
to_numeric = TRUE,
group_neg_corr = TRUE,
prefilter_threshold = 1
)
dataset |
Data frame with an outcome variable and other columns as features. |
outcome_colname |
Column name as a string of the outcome variable
(default |
method |
Methods to preprocess the data, described in
|
remove_var |
Whether to remove variables with near-zero variance
( |
collapse_corr_feats |
Whether to keep only one of perfectly correlated features. |
to_numeric |
Whether to change features to numeric where possible. |
group_neg_corr |
Whether to group negatively correlated features together (e.g. c(0,1) and c(1,0)). |
prefilter_threshold |
Remove features which only have non-zero & non-NA
values N rows or fewer (default: 1). Set this to -1 to keep all columns at
this step. This step will also be skipped if |
Named list including:
dat_transformed
: Preprocessed data.
grp_feats
: If features were grouped together, a named list of the features corresponding to each group.
removed_feats
: Any features that were removed during preprocessing (e.g. because there was zero variance or near-zero variance for those features).
If the progressr
package is installed, a progress bar with time elapsed
and estimated time to completion can be displayed.
See the preprocessing vignette for more details.
Note that if any values in outcome_colname
contain spaces, they will be
converted to underscores for compatibility with caret
.
Zena Lapp, zenalapp@umich.edu
Kelly Sovacool, sovacool@umich.edu
preprocess_data(mikropml::otu_small, "dx")
# the function can show a progress bar if you have the progressr package installed
## optionally, specify the progress bar format
progressr::handlers(progressr::handler_progress(
format = ":message :bar :percent | elapsed: :elapsed | eta: :eta",
clear = FALSE,
show_after = 0
))
## tell progressor to always report progress
## Not run:
progressr::handlers(global = TRUE)
## run the function and watch the live progress udpates
dat_preproc <- preprocess_data(mikropml::otu_small, "dx")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.