Description Usage Arguments Value Author(s) Examples
plot_binary_RF
Plot binary heatmaps for datasets based on rule-based random forest classifier
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | plot_binary_RF(Data,
classifier,
ref = NULL,
prediction = NULL,
as_training = FALSE,
platform = NULL,
classes = NULL,
platforms_ord = NULL,
top_anno = c("ref", "prediction", "platform")[1],
title = "",
binary_col = c("white", "black", "gray"),
ref_col = NULL,
pred_col = NULL,
platform_col = NULL,
show_ref = TRUE,
show_predictions = TRUE,
show_platform = TRUE,
show_scores = TRUE,
show_rule_name = TRUE,
legend = TRUE,
cluster_cols = TRUE,
cluster_rows = TRUE,
anno_height = 0.03,
score_height = 0.03,
margin = c(0, 5, 0, 5))
|
Data |
a matrix or a dataframe, samples as columns and row as features/genes. Can also take ExpressionSet, or data_object generated by ReadData function. |
classifier |
Classifier as a rule_based_RandomForest object, generated by |
ref |
Optional vector with the reference labels. Ref labels in data_object will be used if not ref input provided. For ExpressionSet, the name of the ref variable in the pheno data can be used. |
prediction |
Optional. "ranger.prediction" object for the class scores generated by |
as_training |
Logical indicates if the plot is for the training data. It means the predictions will be extracted from the classifier itself and any prediction object will be ignored. If TRUE, then the training data/object should be used for Data argument. |
platform |
Optional vector with the platform/study labels or any additional annotation. Platform labels in data_object will be used if no platform input is provided. For ExpressionSet, the name of the variable in the pheno data can be used. |
classes |
Optional vector with class names. This will determine which classes will be plotted and in which order. It is not recommended to use both "classes" and "platforms_ord" arguments together to avoid sample order conflict and may result in an improper plotting for samples. |
platforms_ord |
Optional vector with the platform/study names. This will determine which platform/study will be plotted and in which order. This will be used when top_anno="platform". It is not recommended to use both "classes" and "platforms_ord" arguments together. |
top_anno |
Determine the top annotation level. Samples will be grouped based on the top_anno. Input can be one of three options: "ref", "prediction", "platform". Default is "ref". |
title |
Character input as a title for the whole heatmap. Default is "". |
binary_col |
Vector determines the colors of the binary heatmap. Default is c("white", "black", "gray"). First color for the color when the rule is false in the sample. Second color for the color when the rule is true. Third color is for NAs. |
ref_col |
Optional named vector determines the colors of classes for the reference labels. Default is NULL. Vector names should match with the ref labels. |
pred_col |
Optional named vector determines the colors of classes for the prediction labels. Default is NULL. Vector names should match with the prediction labels in the prediction labels. |
platform_col |
Optional named vector determines the colors of platforms/study labels. Default is NULL. Vector names should match with the platforms/study labels. |
show_ref |
Logical. Determines if the ref labels will be plotted or not. If the top_anno argument is "ref" then show_ref will be ignored and ref labels will be plotted. |
show_predictions |
Logical. Determines if the prediction labels will be plotted or not. If the top_anno argument is "prediction" then show_predictions will be ignored and predictions will be plotted. |
show_platform |
Logical. Determines if the platform/study labels will be plotted or not. If the top_anno argument is "platform" then show_platform will be ignored and platforms will be plotted. |
show_scores |
Logical. Determines if the prediction scores will be plotted or not. To visualize scores, the classifier should be trained with probability=TRUE otherwise show_scores will be turned FALSE automatically. |
show_rule_name |
Logical. Determines if the rule names will be plotted on the left side of the heatmapp or not. |
legend |
Logical. Determines if a legend will be plotted under the heatmap. |
cluster_cols |
Logical. Clustering the samples in each class (i.e. not all samples in the cohort) based on the binary rules for that class. If top_anno is "platform" then the rules from all classes are used to cluster the samples in each platform. |
cluster_rows |
Logical. Clustering the rules in each class. |
anno_height |
Determines the height of the annotations. It is recommended not to go out of this range 0.01<height<0.1. Default is 0.03. |
score_height |
Determines the height of the score bars. It is recommended not to go out of this range 0.01<height<0.1. Default is 0.03. |
margin |
Determines the margins of the heatmap. Default is c(0, 5, 0, 5). |
returns a heatmap plot for the binary rule
Nour-al-dain Marzouka <nour-al-dain.marzouka at med.lu.se>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | # generate random data
Data <- matrix(runif(8000), nrow=100, ncol=80,
dimnames = list(paste0("G",1:100), paste0("S",1:80)))
# generate random labels
L <- sample(x = c("A","B","C","D"), size = 80, replace = TRUE)
# generate random platform labels
P <- sample(c("P1","P2","P3"), size = 80, replace = TRUE)
# create data object
object <- ReadData(Data = Data,
Labels = L,
Platform = P,
verbose = FALSE)
# sort genes
genes_RF <- sort_genes_RF(data_object = object,
seed=123456, verbose = FALSE)
# to get an idea of how many genes we will use
# and how many rules will be generated
# summary_genes_RF(sorted_genes_RF = genes_RF,
# genes_altogether = c(10,20,50,100,150,200),
# genes_one_vs_rest = c(10,20,50,100,150,200))
# creat and sort rules
# rules_RF <- sort_rules_RF(data_object = object,
# sorted_genes_RF = genes_RF,
# genes_altogether = 100,
# genes_one_vs_rest = 100,
# seed=123456,
# verbose = FALSE)
# parameters <- data.frame(
# gene_repetition=c(3,2,1),
# rules_one_vs_rest=0,
# rules_altogether=c(2,3,10),
# run_boruta=c(FALSE,"produce_error",FALSE),
# plot_boruta = FALSE,
# num.trees=c(100,200,300),
# stringsAsFactors = FALSE)
# parameters
# Or you can use expand.grid to generate dataframe with all parameter combinations
# parameters <- expand.grid(
# gene_repetition=c(3,2,1),
# rules_one_vs_rest=0,
# rules_altogether=c(2,3,10),
# num.trees=c(100,500,1000),
# stringsAsFactors = FALSE)
# parameters
# test <- optimize_RF(data_object = object,
# sorted_rules_RF = rules_RF,
# test_object = NULL,
# overall = c("Accuracy"),
# byclass = NULL, verbose = FALSE,
# parameters = parameters)
# test
# test$summary[which.max(test$summary$Accuracy),]
#
# # train the final model
# # it is preferred to increase the number of trees and rules in case you have
# # large number of samples and features
# # for quick example, we have small number of trees and rules here
# # based on the optimize_RF results we will select the parameters
# RF_classifier <- train_RF(data_object = object,
# gene_repetition = 1,
# rules_altogether = 0,
# rules_one_vs_rest = 10,
# run_boruta = FALSE,
# plot_boruta = FALSE,
# probability = TRUE,
# num.trees = 300,
# sorted_rules_RF = rules_RF,
# boruta_args = list(),
# verbose = TRUE)
#
# # training accuracy
# # get the prediction labels
# # if the classifier trained using probability = FALSE
# training_pred <- RF_classifier$RF_scheme$RF_classifier$predictions
# if (is.factor(training_pred)) {
# x <- as.character(training_pred)
# }
#
# # if the classifier trained using probability = TRUE
# if (is.matrix(training_pred)) {
# x <- colnames(training_pred)[max.col(training_pred)]
# }
#
# # training accuracy
# caret::confusionMatrix(data =factor(x),
# reference = factor(object$data$Labels),
# mode = "everything")
# not to run
# visualize the binary rules in training dataset
# plot_binary_RF(Data = object,
# classifier = RF_classifier,
# prediction = NULL, as_training = TRUE,
# show_scores = TRUE,
# top_anno = "ref",
# show_predictions = TRUE,
# title = "Training data")
# not to run
# Extract and plot the proximity matrix from the classifier for the training data
# it takes long time for large data
# proximity_mat <- proximity_matrix_RF(object = object,
# classifier = RF_classifier,
# plot=TRUE,
# return_matrix=TRUE,
# title = "Test",
# cluster_cols = TRUE)
# not to run
# predict
# test_object # any test data
# results <- predict_RF(classifier = RF_classifier, impute = TRUE,
# Data = test_object)
#
# # visualize the binary rules in training dataset
# plot_binary_RF(Data = test_object,
# classifier = RF_classifier,
# prediction = results, as_training = FALSE,
# show_scores = TRUE,
# top_anno = "ref",
# show_predictions = TRUE,
# title = "Test data")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.