rf_clf.by_datasets.summ: rf_clf.by_datasets.summ

View source: R/ranger_crossRF_plot_util.R

rf_clf.by_datasets.summR Documentation

rf_clf.by_datasets.summ

Description

A integrated pipeline for rf_clf.by_datasets. It runs standard random forests with oob estimation for classification of c_category in each the sub-datasets splited by the s_category. The output includes a summary of rf models in the sub datasets and all important statistics for each of features.

Usage

rf_clf.by_datasets.summ(
  df,
  metadata,
  s_category,
  c_category,
  positive_class = NA,
  rf_imp_pvalues = FALSE,
  nfolds = 3,
  verbose = FALSE,
  ntree = 500,
  p_cutoff = 0.05,
  p.adj.method = "bonferroni",
  q_cutoff = 0.05,
  outdir = NULL
)

Arguments

df

Training data: a data.frame.

metadata

A metadata with at least two categorical variables.

s_category

A string indicates the category in the sample metadata: a ‘factor’ defines the sample grouping for data spliting.

c_category

A indicates the category in the sample metadata: a 'factor' used as sample label for rf classification in each of splited datasets.

positive_class

A string indicates one class in the 'c_category' column of metadata.

rf_imp_pvalues

If compute both importance score and pvalue for each feature.

nfolds

The number of folds in the cross validation.

verbose

Show computation status and estimated runtime.

ntree

The number of trees.

p_cutoff

The cutoff of p values for features, the default value is 0.05.

p.adj.method

The p-value correction method, default is "bonferroni".

q_cutoff

The cutoff of q values for features, the default value is 0.05.

outdir

The output directory, the default is "./".

Value

A list includes a summary of rf models in the sub datasets, all important statistics for each of features, and plots.

See Also

ranger rf_clf.by_datasets


shihuang047/crossRanger documentation built on Nov. 8, 2024, 2:49 a.m.