View source: R/ranger_crossRF_plot_util.R
rf_clf.by_datasets.summ | R Documentation |
A integrated pipeline for rf_clf.by_datasets
. It runs standard random forests with oob estimation for classification of
c_category in each the sub-datasets splited by the s_category. The output includes a summary of rf models in the sub datasets
and all important statistics for each of features.
rf_clf.by_datasets.summ(
df,
metadata,
s_category,
c_category,
positive_class = NA,
rf_imp_pvalues = FALSE,
nfolds = 3,
verbose = FALSE,
ntree = 500,
p_cutoff = 0.05,
p.adj.method = "bonferroni",
q_cutoff = 0.05,
outdir = NULL
)
df |
Training data: a data.frame. |
metadata |
A metadata with at least two categorical variables. |
s_category |
A string indicates the category in the sample metadata: a ‘factor’ defines the sample grouping for data spliting. |
c_category |
A indicates the category in the sample metadata: a 'factor' used as sample label for rf classification in each of splited datasets. |
positive_class |
A string indicates one class in the 'c_category' column of metadata. |
rf_imp_pvalues |
If compute both importance score and pvalue for each feature. |
nfolds |
The number of folds in the cross validation. |
verbose |
Show computation status and estimated runtime. |
ntree |
The number of trees. |
p_cutoff |
The cutoff of p values for features, the default value is 0.05. |
p.adj.method |
The p-value correction method, default is "bonferroni". |
q_cutoff |
The cutoff of q values for features, the default value is 0.05. |
outdir |
The output directory, the default is "./". |
A list includes a summary of rf models in the sub datasets, all important statistics for each of features, and plots.
ranger rf_clf.by_datasets
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.