View source: R/ranger_crossRF_util.R
rf_clf.by_datasets | R Documentation |
It runs standard random forests with oob estimation for classification of c_category in each the sub-datasets splited by the s_category, and apply the model to all the other datasets. The output includes accuracy, auc and Kappa statistics.
rf_clf.by_datasets( df, metadata, s_category, c_category, positive_class = NA, rf_imp_pvalues = FALSE, clr_transform = TRUE, nfolds = 1, verbose = FALSE, ntree = 500, p.adj.method = "BH", q_cutoff = 0.05 )
df |
Training data: a data.frame. |
metadata |
Sample metadata with at least two columns. |
s_category |
A string indicates the category in the sample metadata: a ‘factor’ defines the sample grouping for data spliting. |
c_category |
A indicates the category in the sample metadata as a responsive vector: if a 'factor', rf classification is performed in each of splited datasets. |
positive_class |
A string indicates one class in the 'c_category' column of metadata. |
rf_imp_pvalues |
A boolean value indicating if compute both importance score and pvalue for each feature. |
clr_transform |
A boolean value indicating if the clr-transformation applied. |
nfolds |
The number of folds in the cross validation. |
verbose |
A boolean value indicating if show computation status and estimated runtime. |
ntree |
The number of trees. |
p.adj.method |
The p-value correction method, default is "bonferroni". |
q_cutoff |
The cutoff of q values for features, the default value is 0.05. |
...
Shi Huang
ranger
df <- data.frame(rbind(t(rmultinom(14, 14*5, c(.21,.6,.12,.38,.099))), t(rmultinom(16, 16*5, c(.001,.6,.42,.58,.299))), t(rmultinom(30, 30*5, c(.011,.6,.22,.28,.289))), t(rmultinom(30, 30*5, c(.091,.6,.32,.18,.209))), t(rmultinom(30, 30*5, c(.001,.6,.42,.58,.299))))) df0 <- data.frame(t(rmultinom(120, 600,c(.001,.6,.2,.3,.299)))) metadata<-data.frame(f_s=factor(c(rep("A", 30), rep("B", 30), rep("C", 30), rep("D", 30))), f_c=factor(c(rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16), rep("C", 14), rep("H", 16))), f_d=factor(rep(c(rep("a", 10), rep("b", 10), rep("c", 10)), 4))) system.time(rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_c', positive_class="C")) rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_c', positive_class="C", rf_imp_pvalues=TRUE) rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_d') rf_clf.by_datasets(df, metadata, s_category='f_s', c_category='f_d', rf_imp_pvalues=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.