View source: R/ranger_crossRF_util.R
rf_clf.by_datasets | R Documentation |
It runs standard random forests with oob estimation for classification of c_category in each the sub-datasets splited by the s_category, and apply the model to all the other datasets. The output includes accuracy, auc and Kappa statistics.
rf_clf.by_datasets(
df,
metadata,
s_category,
c_category,
positive_class = NA,
rf_imp_pvalues = FALSE,
clr_transform = TRUE,
nfolds = 1,
verbose = FALSE,
ntree = 500,
p.adj.method = "BH",
q_cutoff = 0.05
)
df |
Training data: a data.frame. |
metadata |
Sample metadata with at least two columns. |
s_category |
A string indicates the category in the sample metadata: a ‘factor’ defines the sample grouping for data spliting. |
c_category |
A indicates the category in the sample metadata as a responsive vector: if a 'factor', rf classification is performed in each of splited datasets. |
positive_class |
A string indicates one class in the 'c_category' column of metadata. |
rf_imp_pvalues |
A boolean value indicating if compute both importance score and pvalue for each feature. |
clr_transform |
A boolean value indicating if the clr-transformation applied. |
nfolds |
The number of folds in the cross validation. |
verbose |
A boolean value indicating if show computation status and estimated runtime. |
ntree |
The number of trees. |
p.adj.method |
The p-value correction method, default is "bonferroni". |
q_cutoff |
The cutoff of q values for features, the default value is 0.05. |
...
Shi Huang
ranger
df <- data.frame(rbind(t(rmultinom(14, 14*5, c(.21,.6,.12,.38,.099))),
t(rmultinom(16, 16*5, c(.001,.6,.42,.58,.299))),
t(rmultinom(30, 30*5, c(.011,.6,.22,.28,.289))),
t(rmultinom(30, 30*5, c(.091,.6,.32,.18,.209))),
t(rmultinom(30, 30*5, c(.001,.6,.42,.58,.299)))))
df0 <- data.frame(t(rmultinom(120, 600,c(.001,.6,.2,.3,.299))))
metadata<-data.frame(f_s=factor(c(rep("A", 30), rep("B", 30), rep("C", 30), rep("D", 30))),
f_c=factor(c(rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16),
rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16))),
f_d=factor(rep(c(rep("a", 10), rep("b", 10), rep("c", 10)), 4)))
system.time(rf_clf.by_datasets(df, metadata, s_category='f_s',
c_category='f_c', positive_class="C"))
rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_c',
positive_class="C", rf_imp_pvalues=TRUE)
rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_d')
rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_d', rf_imp_pvalues=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.