| preprocess_data | R Documentation |
Function to preprocess your data for input into run_ml().
preprocess_data(dataset, ...)
## S4 method for signature 'TreeSummarizedExperiment'
preprocess_data(
dataset,
outcome_colname,
assay.type = "counts",
col.var = NULL,
altexp = NULL,
name = "preprocessed",
...
)
## S4 method for signature 'ANY'
preprocess_data(
dataset,
outcome_colname,
method = c("center", "scale"),
remove_var = "nzv",
collapse_corr_feats = TRUE,
corr_method = "spearman",
corr_thresh = 1,
to_numeric = TRUE,
group_neg_corr = TRUE,
prefilter_threshold = 1,
...
)
dataset |
Data frame with an outcome variable and other columns as
features. Alternatively, the input can be in |
... |
All additional arguments are passed on to |
outcome_colname |
Column name as a string of the outcome variable
(default |
assay.type |
The name of assay from |
col.var |
The name of sample matdata variables from |
altexp |
The name of alternative experiment ( |
name |
Name of results used when the input is
|
method |
Methods to preprocess the data, described in
|
remove_var |
Whether to remove variables with near-zero variance
( |
collapse_corr_feats |
Whether to keep only one of correlated features
(see |
corr_method |
Correlation method. Options are the same as those supported
by |
corr_thresh |
group correlations above or equal to |
to_numeric |
Whether to change features to numeric where possible. |
group_neg_corr |
Whether to group negatively correlated features together (e.g. c(0,1) and c(1,0)). |
prefilter_threshold |
Remove features which only have non-zero & non-NA
values in N rows or fewer (default: 1). Set this to -1 to keep all columns
at this step. This step will also be skipped if |
Named list including:
dat_transformed: Preprocessed data.
grp_feats: If features were grouped together, a named list of the features corresponding to each group.
removed_feats: Any features that were removed during preprocessing (e.g. because there was zero variance or near-zero variance for those features).
If the input is TreeSummarizedExperiment, the output is added as an
additional data to the input object. If the set of features match in output
and input, the results are stored directly to assay slot. If they
do not match, the output is stored to altExp slot of the object.
If the progressr package is installed, a progress bar with time elapsed
and estimated time to completion can be displayed.
See the preprocessing vignette for more details.
Note that if any values in outcome_colname contain spaces, they will be
converted to underscores for compatibility with caret.
Zena Lapp, zenalapp@umich.edu
Kelly Sovacool, sovacool@umich.edu
preprocess_data(mikropml::otu_small, "dx")
# the function can show a progress bar if you have the progressr package installed
## optionally, specify the progress bar format
progressr::handlers(progressr::handler_progress(
format = ":message :bar :percent | elapsed: :elapsed | eta: :eta",
clear = FALSE,
show_after = 0
))
## tell progressor to always report progress
## Not run:
progressr::handlers(global = TRUE)
## run the function and watch the live progress udpates
dat_preproc <- preprocess_data(mikropml::otu_small, "dx")
# Create TreeSE object
library(TreeSummarizedExperiment)
df <- mikropml::otu_small
assay <- df[, !colnames(df) %in% c("dx"), drop = FALSE] |> t() |> as.matrix()
tse <- TreeSummarizedExperiment(assays = SimpleList(counts = assay))
colData(tse)[["dx"]] <- df[["dx"]]
# Preprocess
tse <- preprocess_data(
dataset = tse,
assay.type = "counts",
outcome_colname = "dx"
)
# The result is in assay slot
tse
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.