summary_genes_RF: Summarize sorted genes to rules
In NourMarzouka/multiclassPairs: Build MultiClass Pair-Based Classifiers using TSPs or RF

summary_genes_RF

R Documentation

Summarize sorted genes to rules

Description

After sorting genes RF by sort_genes_RF function summary_genes_RF gives an idea of how many genes you need to use to generate specific number of rules in sort_rules_RF function.

Usage

summary_genes_RF(sorted_genes_RF,
                 genes_altogether,
                 genes_one_vs_rest)

Arguments

`sorted_genes_RF`	sorted genes object with class `RandomForest_sorted_genes` generated by `sort_genes_RF` function
`genes_altogether`	numeric vector indicating how many genes from altogether slot (i.e. 'all') should be used each time. `genes_altogether` should be a vector with zero or positive numbers and with the same length of `genes_one_vs_rest` vector. Each element in this vector will be used with the element with the same index in `genes_one_vs_rest` vector.
`genes_one_vs_rest`	numeric vector indicating how many genes from one_vs_rest slots (i.e. per class) should be used each time. `genes_one_vs_rest` should be a vector with zero or positive numbers and with the same length of `genes_altogether` vector. Each element in this vector will be used with the element with the same index in `genes_altogether` vector.

Details

summary_genes_RF function helps the user to know which number of genes should be used to get the needed number of rules in sort_rules_RF function. NOTE: without consideration of gene replication in rules, because the rules are not sorted yet. summary_genes_RF workes as follows: take the first element in genes_altogether and genes_one_vs_rest, then bring this number of top genes from altogether slot and one_vs_rest slots (this number of genes will be taken from each class), respectively, from the sorted_genes_RF object. Then pool the extracted genes. Then take the unique genes. Then calculate the number of the possible combinations. Store the number of unique genes and rules in first row in the output dataframe then pick the second element from the genes_altogether and genes_one_vs_rest and repeat the steps again.

Value

returns a dataframe with the used paramerters and the expected number of unique genes and rules. Number of rows of the dataframe equals the length of genes_altogether and genes_one_vs_rest.

Author(s)

Nour-al-dain Marzouka <nour-al-dain.marzouka at med.lu.se>

Examples

# generate random data
Data <- matrix(runif(8000), nrow=100, ncol=80,
               dimnames = list(paste0("G",1:100), paste0("S",1:80)))

# generate random labels
L <- sample(x = c("A","B","C","D"), size = 80, replace = TRUE)

# generate random platform labels
P <- sample(c("P1","P2","P3"), size = 80, replace = TRUE)

# create data object
object <- ReadData(Data = Data,
                   Labels = L,
                   Platform = P,
                   verbose = FALSE)

# sort genes
genes_RF <- sort_genes_RF(data_object = object,
                          seed=123456, verbose = FALSE)

# to get an idea of how many genes we will use
# and how many rules will be generated
# summary_genes_RF(sorted_genes_RF = genes_RF,
#                  genes_altogether = c(10,20,50,100,150,200),
#                  genes_one_vs_rest = c(10,20,50,100,150,200))

# creat and sort rules
# rules_RF <- sort_rules_RF(data_object = object,
#                           sorted_genes_RF = genes_RF,
#                           genes_altogether = 100,
#                           genes_one_vs_rest = 100,
#                           seed=123456,
#                           verbose = FALSE)

# parameters <- data.frame(
#   gene_repetition=c(3,2,1),
#   rules_one_vs_rest=0,
#   rules_altogether=c(2,3,10),
#   run_boruta=c(FALSE,"produce_error",FALSE),
#   plot_boruta = FALSE,
#   num.trees=c(100,200,300),
#   stringsAsFactors = FALSE)
# parameters

# Or you can use expand.grid to generate dataframe with all parameter combinations
# parameters <- expand.grid(
#   gene_repetition=c(3,2,1),
#   rules_one_vs_rest=0,
#   rules_altogether=c(2,3,10),
#   num.trees=c(100,500,1000),
#   stringsAsFactors = FALSE)
# parameters


# test <- optimize_RF(data_object = object,
#                     sorted_rules_RF = rules_RF,
#                     test_object = NULL,
#                     overall = c("Accuracy"),
#                     byclass = NULL, verbose = FALSE,
#                     parameters = parameters)
# test
# test$summary[which.max(test$summary$Accuracy),]
#
# # train the final model
# # it is preferred to increase the number of trees and rules in case you have
# # large number of samples and features
# # for quick example, we have small number of trees and rules here
# # based on the optimize_RF results we will select the parameters
# RF_classifier <- train_RF(data_object = object,
#                           gene_repetition = 1,
#                           rules_altogether = 0,
#                           rules_one_vs_rest = 10,
#                           run_boruta = FALSE,
#                           plot_boruta = FALSE,
#                           probability = TRUE,
#                           num.trees = 300,
#                           sorted_rules_RF = rules_RF,
#                           boruta_args = list(),
#                           verbose = TRUE)
#
# # training accuracy
# # get the prediction labels
# # if the classifier trained using probability	= FALSE
# training_pred <- RF_classifier$RF_scheme$RF_classifier$predictions
# if (is.factor(training_pred)) {
#   x <- as.character(training_pred)
# }
#
# # if the classifier trained using probability	= TRUE
# if (is.matrix(training_pred)) {
#   x <- colnames(training_pred)[max.col(training_pred)]
# }
#
# # training accuracy
# caret::confusionMatrix(data =factor(x),
#                 reference = factor(object$data$Labels),
#                 mode = "everything")

# not to run
# visualize the binary rules in training dataset
# plot_binary_RF(Data = object,
#                classifier = RF_classifier,
#                prediction = NULL, as_training = TRUE,
#                show_scores = TRUE,
#                top_anno = "ref",
#                show_predictions = TRUE,
#                title = "Training data")

# not to run
# Extract and plot the proximity matrix from the classifier for the training data
# it takes long time for large data
# proximity_mat <- proximity_matrix_RF(object = object,
#                       classifier = RF_classifier,
#                       plot=TRUE,
#                       return_matrix=TRUE,
#                       title = "Test",
#                       cluster_cols = TRUE)

# not to run
# predict
# test_object # any test data
# results <- predict_RF(classifier = RF_classifier, impute = TRUE,
#                       Data = test_object)
#
# # visualize the binary rules in training dataset
# plot_binary_RF(Data = test_object,
#                classifier = RF_classifier,
#                prediction = results, as_training = FALSE,
#                show_scores = TRUE,
#                top_anno = "ref",
#                show_predictions = TRUE,
#                title = "Test data")

NourMarzouka/multiclassPairs documentation built on May 3, 2023, 7:20 p.m.

NourMarzouka/multiclassPairs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NourMarzouka/multiclassPairs
Build MultiClass Pair-Based Classifiers using TSPs or RF

summary_genes_RF: Summarize sorted genes to rules
In NourMarzouka/multiclassPairs: Build MultiClass Pair-Based Classifiers using TSPs or RF

Summarize sorted genes to rules

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to summary_genes_RF in NourMarzouka/multiclassPairs...

R Package Documentation

Browse R Packages

We want your feedback!

NourMarzouka/multiclassPairs Build MultiClass Pair-Based Classifiers using TSPs or RF

summary_genes_RF: Summarize sorted genes to rules In NourMarzouka/multiclassPairs: Build MultiClass Pair-Based Classifiers using TSPs or RF

Summarize sorted genes to rules

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to summary_genes_RF in NourMarzouka/multiclassPairs...

R Package Documentation

Browse R Packages

We want your feedback!

NourMarzouka/multiclassPairs
Build MultiClass Pair-Based Classifiers using TSPs or RF

summary_genes_RF: Summarize sorted genes to rules
In NourMarzouka/multiclassPairs: Build MultiClass Pair-Based Classifiers using TSPs or RF