filter_genes_TSP: Filter genes/features for multiclass one-vs-rest classifier...

Description Usage Arguments Details Value Author(s) Examples

View source: R/functions.R

Description

filter_genes_TSP filters genes/features prior of downstream training steps that will involve top scores pairs using switchBox package.

Usage

1
2
3
4
5
6
filter_genes_TSP(data_object,
                filter = c("one_vs_one", "one_vs_rest"),
                platform_wise = FALSE,
                featureNo = 1000,
                UpDown = TRUE,
                verbose = TRUE)

Arguments

data_object

data object generated by ReadData function. Object contains the data and labels.

filter

a character indicating the method of comparison to be used in the filtering. Should be "one_vs_one" or "one_vs_rest".

platform_wise

a logical indicating if the comparisons will be done in each platform alone then the features should be ranked high in all platforms to be selected. If TRUE then the data object should contain platform vector to be used in splitting samples based on the platform before the comparisons

featureNo

an integer indicating the number of features to be returned after the filtering per class

UpDown

a logical value indicating whether an equal number of up and down genes should be considered. If FALSE then the number of features will be collected regardless of the portion of the up genes and down genes

verbose

a logical value indicating whether processing messages will be printed or not. Default is TRUE.

Details

Input data will be ranked (rank the features inside each sample) before appling the comparison methods. Sufficient number of returned features is recommended if large number of rules is used in the downstream training steps.

For one vs rest comparisons, Wilcoxon test will be performed through SWAP.Filter.Wilcoxon function from switchBox package. For one vs one comparisons, dunn test will be performed through dunn.test function from dunn.test package, and this method is recommended in case of class imbalance or if the classes are so close to each other in their properties.

P-values from dunn test are ranked in each one vs one comparison then the ranks are combined, the selected genes should be ranked at the top in all comparisons.

If platform-wise is TRUE, then the gene that is ranked high in all comparisons and in all platforms will be selected.

For example, if we have five classes (i.e. C1-C5), and dunn.test was performed, then C1 will have four comparison agains C2-C5 (so four lists of p-values), pvalues will be ranked (smaller number means smaller p.value) in each list, and the gene that is ranked (5,5,5,5) will be prioritized over the gene that is ranked (1,1,1,6), and in case we have two platforms and we truned the platform-wise to TRUE then we will have 8 lists of p-values, and the top genes will be selected in the same way. And this is apply also on the platform-wise one-vs-rest comparison. So in brief, the lowest rank in the different list will determine the position of the gene in the output.

Other p-value combining methods could be added in the future.

Value

OnevsrestScheme_genes_TSP object that contains the names of the top filtered features for each class

Author(s)

Nour-al-dain Marzouka <nour-al-dain.marzouka at med.lu.se>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# random data
Data <- matrix(runif(10000), nrow=100, ncol=100,
               dimnames = list(paste0("G",1:100), paste0("S",1:100)))

# labels
L <- sample(x = c("A","B","C"), size = 100, replace = TRUE)

# study/platform
P <- sample(c("P1","P2"), size = 100, replace = TRUE)

object <- ReadData(Data = Data,
             Labels = L,
                   Platform = P)

# not to run
# switchBox package from Bioconductor is needed
# Visit their website or install switchBox package using:
# if(!requireNamespace("switchBox", quietly = TRUE)){
#       if (!requireNamespace('BiocManager', quietly = TRUE)) {
#       install.packages('BiocManager')
#      }
#      BiocManager::install('switchBox')", call. = FALSE)
#  }

# filtered_genes <- filter_genes_TSP(data_object = object,
#                                  filter = "one_vs_rest",
#                                  platform_wise = FALSE,
#                                  featureNo = 10,
#                                  UpDown = TRUE,
#                                  verbose = FALSE)

# training
# classifier <- train_one_vs_rest_TSP(data_object = object,
#                              filtered_genes = filtered_genes,
#                              k_range = 10:50,
#                              include_pivot = FALSE,
#                              one_vs_one_scores = FALSE,
#                              platform_wise_scores = FALSE,
#                              seed = 1234,
#                              verbose = FALSE)

# results <- predict_one_vs_rest_TSP(classifier = classifier,
#                                   Data = object,
#                                   tolerate_missed_genes = TRUE,
#                                   weighted_votes = TRUE,
#                                   verbose = FALSE)

# Confusion Matrix and Statistics on training data
#  caret::confusionMatrix(data = factor(results$max_score, levels = unique(L)),
#                         reference = factor(L, levels = unique(L)),
#                         mode="everything")

# plot_binary_TSP(Data = object, classes=c("A","B","C"),
#                 classifier = classifier,
#                 prediction = results,
#                 title = "Test")

multiclassPairs documentation built on May 17, 2021, 1:06 a.m.