rosetta: ROSETTA classifier.

Description Usage Arguments Value Author(s) Examples

View source: R/rosetta.R

Description

Performs a rule-based classification.

Usage

1
2
3
4
5
6
7
8
9
rosetta(dt, classifier = "StandardVoter", cvNum = 10, discrete = FALSE, discreteMethod = "EqualFrequency",
discreteParam = 3, discreteMask = TRUE, reducer = "Johnson", reducerDiscernibility = "Object",
roc = FALSE, clroc = "autism", fallBack = TRUE, fallBackClass = "autism", maskFeatures = FALSE, maskFeaturesNames = c(),
underSample = FALSE, underSampleNum = 0, underSampleSize = 0, ruleFiltration = FALSE, ruleFiltrSupport = c(1, 3),
ruleFiltrAccuracy = c(0, 0.5), ruleFiltrCoverage = c(0, 0), ruleFiltrStability = c(0, 0),
JohnsonParam = list(Modulo=TRUE, BRT=FALSE, BRTprec=0.9, Precompute=FALSE, Approximate=TRUE, Fraction=0.95),
GeneticParam = list(Modulo=TRUE, BRT=FALSE, BRTprec=0.9, Precompute=FALSE, Approximate=TRUE, Fraction=0.95, Algorithm="Simple"),
ManualNames = c(), pAdjust = TRUE, pAdjustMethod = "bonferroni", seed = 1, invert = FALSE, fraction=0.5, calibration = FALSE,
fillNA = FALSE, fillNAmethod = "meanOrMode", remSpChars = FALSE)

Arguments

dt

A data frame containing decision table. The last column is decision.

classifier

A character containing the classifier type: StandardVoter, ObjectTrackingVoter or NaiveBayesClassifier. Default is StandardVoter.

cvNum

A numeric value of the cross-validation number. Default is 10.

discrete

Logical. Set TRUE for discrete data. Default is FALSE.

discreteMethod

A character containing discretization method: EqualFrequency, MDL, Naive, SemiNaive or BROrthogonal. Default is EqualFrequency.

discreteParam

A vector containing discretization parameters. May be of different length and values. See examples.

discreteMask

Logical. Set FALSE to disable discretization mask. Default is TRUE.

reducer

A character containing name of reducer method: Johnson or Genetic. Default is Johnson.

reducerDiscernibility

A character containing reducer discernibility option: Full or Object. Default is Object.

roc

Logical. Set TRUE to calculate the AUC and ROC values. Default is FALSE.

clroc

A character containing the name of the class. Default is "autism".

fallBack

Logical. Set TRUE to support classifier with fallback class. Default is TRUE.

fallBackClass

A character containing the name of the class. Default is "autism".

maskFeatures

Logical. Set TRUE to mask features during the classification process. Default is FALSE.

maskFeaturesNames

A character vector of the feature names to mask. Names shall correspond to the column names.

underSample

Logical. Set TRUE to perform undersampling. Default is FALSE.

underSampleNum

The number of subset for undersampling. For 0, minimum number of subsets that cover all the objects is selected. Default is 0.

underSampleSize

The size of each subset for undersampling. For 0, the size is taken from the smallest decision class. Default is 0.

ruleFiltration

Logical. Set TRUE to filter out rules. Default is FALSE.

ruleFiltrSupport

A vector of two integers containing interval of support values to filter out. Default is c(1,3).

ruleFiltrAccuracy

A vector of two numbers containing interval of accuracy values to filter out. Default is c(0,0.5).

ruleFiltrCoverage

A vector of two numbers containing interval of coverage values to filter out. Default is c(0,0).

ruleFiltrStability

A vector of two numbers containing interval of support values to filter out. Integer. Default is c(0,0).

JohnsonParam

A vector containing Johnson reducer parameters.

GeneticParam

A vector containing Genetic reducer parameters.

ManualNames

A vector containing manual names for manual reducer.

pAdjust

Logical. Set TRUE to apply rule p-value and relative risk p-value adjustment. Default is TRUE.

pAdjustMethod

A character containing the name of the method: holm, hochberg, hommel, bonferroni, BH, BY, fdr or none. Default is bonferroni.

seed

An integer. Seed to the random number generator. Default is 1.

invert

Logical. Set TRUE to swap training for test set. Default is FALSE.

fraction

Numeric. Hitting fraction for classifier.

calibration

Logical. Set TRUE for calibration.

fillNA

Logical. Set TRUE to fill NA values.

fillNAmethod

Character. Set method of filling NA values: meanOrMode or combinatorial.

remSpChars

Logical. Remove special characters from feature names. Default is FALSE.

Value

main

A data frame containing rule information about: features, discretization levels, decision, accuracy, support, coverage, stability, p-value and other statistic. The table is decreasingly sorted according to the p-value.

quality

A table of model quality: accuracy statistic, ROC and AUC measures.

usMeanAccs

A vector containing accuracies of the models from undersampling. Only if underSample = TRUE.

usn

An integer indicating the minimum number of required subsets for undersampling. Only if underSample = TRUE.

ROCstats

A data frame containing statistic of the model: 1 - specificity, sensitivity, specificity, PPV, NPV, accuracy and threshold. Only if roc = TRUE.

Author(s)

Mateusz Garbulowski, Karolina Smolinska

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
  
library(R.ROSETTA)
set.seed(1)

### default settings ###
ruleModel <- rosetta(autcon)
ruleModel$quality$accuracyMean

### undersampling ###
ruleModelUS <- rosetta(autcon, underSample=TRUE, underSampleNum=10, underSampleSize=50)
ruleModelUS$quality$accuracyMean

### classifiers ###
# StandardVoter
ruleModelSV <- rosetta(autcon, classifier="StandardVoter")
ruleModelSV$quality$accuracyMean

# ObjectTrackingVoter
ruleModelOTV <- rosetta(autcon, classifier="ObjectTrackingVoter")
ruleModelOTV$quality$accuracyMean

# NaiveBayesClassifier
ruleModelNBC <- rosetta(autcon, classifier="NaiveBayesClassifier")
ruleModelNBC$quality$accuracyMean

### reducers ###
# Johnson
ruleModelJohnson <- rosetta(autcon, reducer="Johnson", JohnsonParam=c(Modulo=TRUE, BRT=TRUE, BRTprec=0.1, Precompute=FALSE, Approximate=TRUE, Fraction=0.8))
ruleModelJohnson$quality$accuracyMean

# Genetic
ruleModelGenetic <- rosetta(autcon, reducer="Genetic", GeneticParam=c(Modulo=TRUE, BRT=TRUE, BRTprec=0.1, Precompute=FALSE, Approximate=TRUE, Fraction=0.8, Algorithm="Simple"))
ruleModelGenetic$quality$accuracyMean

### discernibility ###
# Full
ruleModelFull <- rosetta(autcon, reducerDiscernibility="Full")
ruleModelFull$quality$accuracyMean

# Object
ruleModelObject <- rosetta(autcon, reducerDiscernibilit="Object")
ruleModelObject$quality$accuracyMean

### discretization ###
# EqualFrequencyScaler
ruleModelEF <- rosetta(autcon, discrete=FALSE, discreteMethod="EqualFrequency", discreteParam=3)
ruleModelEF$quality$accuracyMean

# MDL
ruleModelMDL <- rosetta(autcon, discrete=FALSE, discreteMethod="MDL")
ruleModelMDL$quality$accuracyMean

# Naive
ruleModelNaive <- rosetta(autcon, discrete=FALSE, discreteMethod="Naive")
ruleModelNaive$quality$accuracyMean

# SemiNaive
ruleModelSemiNaive <- rosetta(autcon, discrete=FALSE, discreteMethod="SemiNaive")
ruleModelSemiNaive$quality$accuracyMean

# BRO
ruleModelBRO <- rosetta(autcon, discrete=FALSE, discreteMethod="BROrthogonal", discreteParam=list(TRUE, 0.95))
ruleModelBRO$quality$accuracyMean

### for discrete data ###
# generate discrete synthetic data
dt <- synData(nFeatures=c(5,5,3,2,2), rf=c(0.2,0.3,0.2,0.4,0.4), 
              rd=c(0.2,0.3,0.4,0.5,0.6), discrete = TRUE, levels = 3, labels = c("low", "medium", "high"))
ruleModelDiscrete <- rosetta(dt, discrete = TRUE)
ruleModelDiscrete$quality$accuracyMean

### for mixed data(discrete and non-discrete) data frame should contain specific structures: ###
# for discrete values: logical, character or factor
# for non-discrete values: float, numeric or integer

# generate continouous synthetic data
dt <- synData(nFeatures=c(20,2,2,3,3), rf=c(0.1,0.1,0.1,0.8,0.8), rd=c(0.5,0.1,0.7,0.8,0.2), nObjects=100, nOutcome=2, unbalanced=F, seed=1)
# change two of the features from the group 5 to discrete
dt$f5.2_rf0.8_rd0.2 <- as.factor(cut(dt$f5.2_rf0.8_rd0.2, 3, labels = c("low", "medium", "high")))
dt$f5.3_rf0.8_rd0.2 <- as.factor(cut(dt$f5.3_rf0.8_rd0.2, 3, labels = c("low", "medium", "high")))
ruleModelMixed <- rosetta(dt, discrete=F)
ruleModelMixed$quality$accuracyMean

### calculate AUC ###
# for class: autism
ruleModelAUCa <- rosetta(autcon, roc=TRUE, clroc="autism")
ruleModelAUCa$quality

# for class: control
ruleModelAUCc <- rosetta(autcon, roc=TRUE, clroc="control")
ruleModelAUCc$quality

### set fallback class ###
#for class: autism
ruleModelFBa <- rosetta(autcon, fallBack=TRUE, fallBackClass="autism")
ruleModelFBa$quality$accuracyMean

#for class: control
ruleModelFBc <- rosetta(autcon, fallBack=TRUE, fallBackClass="control")
ruleModelFBc$quality$accuracyMean

### rules filtration ###
# accuracy
ruleModelFiltAcc <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrAccuracy=c(0, 0.85))
ruleModelFiltAcc$quality$accuracyMean

# support 
ruleModelFiltSupp <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrSupport=c(1, 10))
ruleModelFiltSupp$quality$accuracyMean

# coverage 
ruleModelFiltCov <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrCoverage=c(0, 0.1))
ruleModelFiltCov$quality$accuracyMean

# stability
ruleModelFiltStab <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrStability=c(1, 5))
dim(ruleModelFiltStab$main)[1]

### mask features ###
ruleModelMaskFs2 <- rosetta(autcon, maskFeatures=TRUE, maskFeaturesNames=c("MAP7", "COX2"))
ruleModelMaskFs2$quality$accuracyMean

# remove first 10 features from decision table
ruleModelMaskFs10 <- rosetta(autcon, maskFeatures=TRUE, maskFeaturesNames=colnames(autcon)[1:10])
ruleModelMaskFs10$quality$accuracyMean

### fill NA values ###
autcon2 <- autcon
#introduce 3 NA values
autcon2[2,2] <- NA
autcon2[3,3] <- NA
autcon2[4,4] <- NA
ruleModelFillNA <- rosetta(autcon2, fillNA=TRUE, fillNAmethod="meanOrMode")
ruleModelFillNA$quality$accuracyMean
  
### perform permutation test ###

# original data
out <- rosetta(autcon)
acc0 <- out$quality$accuracyMean

# permuted data
n_perm <- 20 # number of iterations
autcon_perm <- autcon
acc <- c()

for(i in 1:n_perm){
autcon_perm$decision <- sample(autcon_perm$decision)
out_perm <- rosetta(autcon_perm)
acc[i] <- out_perm$quality$accuracyMean
}

# visualization
hist(acc, col="lightpink", xlim=c(0,1), main="permutation test", xlab="accuracy")
abline(v = acc0, col="mediumslateblue", lwd=3, lty=2)

mategarb/R.ROSETTA documentation built on Aug. 20, 2020, 5:21 a.m.