rf_clf.by_datasets: rf_clf.by_datasets

View source: R/ranger_crossRF_util.R

rf_clf.by_datasetsR Documentation

rf_clf.by_datasets

Description

It runs standard random forests with oob estimation for classification of c_category in each the sub-datasets splited by the s_category, and apply the model to all the other datasets. The output includes accuracy, auc and Kappa statistics.

Usage

rf_clf.by_datasets(
  df,
  metadata,
  s_category,
  c_category,
  positive_class = NA,
  rf_imp_pvalues = FALSE,
  clr_transform = TRUE,
  nfolds = 1,
  verbose = FALSE,
  ntree = 500,
  p.adj.method = "BH",
  q_cutoff = 0.05
)

Arguments

df

Training data: a data.frame.

metadata

Sample metadata with at least two columns.

s_category

A string indicates the category in the sample metadata: a ‘factor’ defines the sample grouping for data spliting.

c_category

A indicates the category in the sample metadata as a responsive vector: if a 'factor', rf classification is performed in each of splited datasets.

positive_class

A string indicates one class in the 'c_category' column of metadata.

rf_imp_pvalues

A boolean value indicating if compute both importance score and pvalue for each feature.

clr_transform

A boolean value indicating if the clr-transformation applied.

nfolds

The number of folds in the cross validation.

verbose

A boolean value indicating if show computation status and estimated runtime.

ntree

The number of trees.

p.adj.method

The p-value correction method, default is "bonferroni".

q_cutoff

The cutoff of q values for features, the default value is 0.05.

Value

...

Author(s)

Shi Huang

See Also

ranger

Examples

df <- data.frame(rbind(t(rmultinom(14, 14*5, c(.21,.6,.12,.38,.099))),
            t(rmultinom(16, 16*5, c(.001,.6,.42,.58,.299))),
            t(rmultinom(30, 30*5, c(.011,.6,.22,.28,.289))),
            t(rmultinom(30, 30*5, c(.091,.6,.32,.18,.209))),
            t(rmultinom(30, 30*5, c(.001,.6,.42,.58,.299)))))
df0 <- data.frame(t(rmultinom(120, 600,c(.001,.6,.2,.3,.299))))
metadata<-data.frame(f_s=factor(c(rep("A", 30), rep("B", 30), rep("C", 30), rep("D", 30))),
                     f_c=factor(c(rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16),
                                  rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16))),
                     f_d=factor(rep(c(rep("a", 10), rep("b", 10), rep("c", 10)), 4)))
system.time(rf_clf.by_datasets(df, metadata, s_category='f_s',
            c_category='f_c', positive_class="C"))
rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_c',
                   positive_class="C", rf_imp_pvalues=TRUE)
rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_d')
rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_d', rf_imp_pvalues=TRUE)

shihuang047/crossRanger documentation built on Feb. 7, 2023, 10:03 p.m.