Description Usage Arguments Value Author(s) Examples
Performs a rule-based classification.
1 2 3 4 5 6 7 8 9 | rosetta(dt, classifier = "StandardVoter", cvNum = 10, discrete = FALSE, discreteMethod = "EqualFrequency",
discreteParam = 3, discreteMask = TRUE, reducer = "Johnson", reducerDiscernibility = "Object",
roc = FALSE, clroc = "autism", fallBack = TRUE, fallBackClass = "autism", maskFeatures = FALSE, maskFeaturesNames = c(),
underSample = FALSE, underSampleNum = 0, underSampleSize = 0, ruleFiltration = FALSE, ruleFiltrSupport = c(1, 3),
ruleFiltrAccuracy = c(0, 0.5), ruleFiltrCoverage = c(0, 0), ruleFiltrStability = c(0, 0),
JohnsonParam = list(Modulo=TRUE, BRT=FALSE, BRTprec=0.9, Precompute=FALSE, Approximate=TRUE, Fraction=0.95),
GeneticParam = list(Modulo=TRUE, BRT=FALSE, BRTprec=0.9, Precompute=FALSE, Approximate=TRUE, Fraction=0.95, Algorithm="Simple"),
ManualNames = c(), pAdjust = TRUE, pAdjustMethod = "bonferroni", seed = 1, invert = FALSE, fraction=0.5, calibration = FALSE,
fillNA = FALSE, fillNAmethod = "meanOrMode", remSpChars = FALSE)
|
dt |
A data frame containing decision table. The last column is decision. |
classifier |
A character containing the classifier type: StandardVoter, ObjectTrackingVoter or NaiveBayesClassifier. Default is StandardVoter. |
cvNum |
A numeric value of the cross-validation number. Default is 10. |
discrete |
Logical. Set TRUE for discrete data. Default is FALSE. |
discreteMethod |
A character containing discretization method: EqualFrequency, MDL, Naive, SemiNaive or BROrthogonal. Default is EqualFrequency. |
discreteParam |
A vector containing discretization parameters. May be of different length and values. See examples. |
discreteMask |
Logical. Set FALSE to disable discretization mask. Default is TRUE. |
reducer |
A character containing name of reducer method: Johnson or Genetic. Default is Johnson. |
reducerDiscernibility |
A character containing reducer discernibility option: Full or Object. Default is Object. |
roc |
Logical. Set TRUE to calculate the AUC and ROC values. Default is FALSE. |
clroc |
A character containing the name of the class. Default is "autism". |
fallBack |
Logical. Set TRUE to support classifier with fallback class. Default is TRUE. |
fallBackClass |
A character containing the name of the class. Default is "autism". |
maskFeatures |
Logical. Set TRUE to mask features during the classification process. Default is FALSE. |
maskFeaturesNames |
A character vector of the feature names to mask. Names shall correspond to the column names. |
underSample |
Logical. Set TRUE to perform undersampling. Default is FALSE. |
underSampleNum |
The number of subset for undersampling. For 0, minimum number of subsets that cover all the objects is selected. Default is 0. |
underSampleSize |
The size of each subset for undersampling. For 0, the size is taken from the smallest decision class. Default is 0. |
ruleFiltration |
Logical. Set TRUE to filter out rules. Default is FALSE. |
ruleFiltrSupport |
A vector of two integers containing interval of support values to filter out. Default is c(1,3). |
ruleFiltrAccuracy |
A vector of two numbers containing interval of accuracy values to filter out. Default is c(0,0.5). |
ruleFiltrCoverage |
A vector of two numbers containing interval of coverage values to filter out. Default is c(0,0). |
ruleFiltrStability |
A vector of two numbers containing interval of support values to filter out. Integer. Default is c(0,0). |
JohnsonParam |
A vector containing Johnson reducer parameters. |
GeneticParam |
A vector containing Genetic reducer parameters. |
ManualNames |
A vector containing manual names for manual reducer. |
pAdjust |
Logical. Set TRUE to apply rule p-value and relative risk p-value adjustment. Default is TRUE. |
pAdjustMethod |
A character containing the name of the method: holm, hochberg, hommel, bonferroni, BH, BY, fdr or none. Default is bonferroni. |
seed |
An integer. Seed to the random number generator. Default is 1. |
invert |
Logical. Set TRUE to swap training for test set. Default is FALSE. |
fraction |
Numeric. Hitting fraction for classifier. |
calibration |
Logical. Set TRUE for calibration. |
fillNA |
Logical. Set TRUE to fill NA values. |
fillNAmethod |
Character. Set method of filling NA values: meanOrMode or combinatorial. |
remSpChars |
Logical. Remove special characters from feature names. Default is FALSE. |
main |
A data frame containing rule information about: features, discretization levels, decision, accuracy, support, coverage, stability, p-value and other statistic. The table is decreasingly sorted according to the p-value. |
quality |
A table of model quality: accuracy statistic, ROC and AUC measures. |
usMeanAccs |
A vector containing accuracies of the models from undersampling. Only if underSample = TRUE. |
usn |
An integer indicating the minimum number of required subsets for undersampling. Only if underSample = TRUE. |
ROCstats |
A data frame containing statistic of the model: 1 - specificity, sensitivity, specificity, PPV, NPV, accuracy and threshold. Only if roc = TRUE. |
Mateusz Garbulowski, Karolina Smolinska
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
library(R.ROSETTA)
set.seed(1)
### default settings ###
ruleModel <- rosetta(autcon)
ruleModel$quality$accuracyMean
### undersampling ###
ruleModelUS <- rosetta(autcon, underSample=TRUE, underSampleNum=10, underSampleSize=50)
ruleModelUS$quality$accuracyMean
### classifiers ###
# StandardVoter
ruleModelSV <- rosetta(autcon, classifier="StandardVoter")
ruleModelSV$quality$accuracyMean
# ObjectTrackingVoter
ruleModelOTV <- rosetta(autcon, classifier="ObjectTrackingVoter")
ruleModelOTV$quality$accuracyMean
# NaiveBayesClassifier
ruleModelNBC <- rosetta(autcon, classifier="NaiveBayesClassifier")
ruleModelNBC$quality$accuracyMean
### reducers ###
# Johnson
ruleModelJohnson <- rosetta(autcon, reducer="Johnson", JohnsonParam=c(Modulo=TRUE, BRT=TRUE, BRTprec=0.1, Precompute=FALSE, Approximate=TRUE, Fraction=0.8))
ruleModelJohnson$quality$accuracyMean
# Genetic
ruleModelGenetic <- rosetta(autcon, reducer="Genetic", GeneticParam=c(Modulo=TRUE, BRT=TRUE, BRTprec=0.1, Precompute=FALSE, Approximate=TRUE, Fraction=0.8, Algorithm="Simple"))
ruleModelGenetic$quality$accuracyMean
### discernibility ###
# Full
ruleModelFull <- rosetta(autcon, reducerDiscernibility="Full")
ruleModelFull$quality$accuracyMean
# Object
ruleModelObject <- rosetta(autcon, reducerDiscernibilit="Object")
ruleModelObject$quality$accuracyMean
### discretization ###
# EqualFrequencyScaler
ruleModelEF <- rosetta(autcon, discrete=FALSE, discreteMethod="EqualFrequency", discreteParam=3)
ruleModelEF$quality$accuracyMean
# MDL
ruleModelMDL <- rosetta(autcon, discrete=FALSE, discreteMethod="MDL")
ruleModelMDL$quality$accuracyMean
# Naive
ruleModelNaive <- rosetta(autcon, discrete=FALSE, discreteMethod="Naive")
ruleModelNaive$quality$accuracyMean
# SemiNaive
ruleModelSemiNaive <- rosetta(autcon, discrete=FALSE, discreteMethod="SemiNaive")
ruleModelSemiNaive$quality$accuracyMean
# BRO
ruleModelBRO <- rosetta(autcon, discrete=FALSE, discreteMethod="BROrthogonal", discreteParam=list(TRUE, 0.95))
ruleModelBRO$quality$accuracyMean
### for discrete data ###
# generate discrete synthetic data
dt <- synData(nFeatures=c(5,5,3,2,2), rf=c(0.2,0.3,0.2,0.4,0.4),
rd=c(0.2,0.3,0.4,0.5,0.6), discrete = TRUE, levels = 3, labels = c("low", "medium", "high"))
ruleModelDiscrete <- rosetta(dt, discrete = TRUE)
ruleModelDiscrete$quality$accuracyMean
### for mixed data(discrete and non-discrete) data frame should contain specific structures: ###
# for discrete values: logical, character or factor
# for non-discrete values: float, numeric or integer
# generate continouous synthetic data
dt <- synData(nFeatures=c(20,2,2,3,3), rf=c(0.1,0.1,0.1,0.8,0.8), rd=c(0.5,0.1,0.7,0.8,0.2), nObjects=100, nOutcome=2, unbalanced=F, seed=1)
# change two of the features from the group 5 to discrete
dt$f5.2_rf0.8_rd0.2 <- as.factor(cut(dt$f5.2_rf0.8_rd0.2, 3, labels = c("low", "medium", "high")))
dt$f5.3_rf0.8_rd0.2 <- as.factor(cut(dt$f5.3_rf0.8_rd0.2, 3, labels = c("low", "medium", "high")))
ruleModelMixed <- rosetta(dt, discrete=F)
ruleModelMixed$quality$accuracyMean
### calculate AUC ###
# for class: autism
ruleModelAUCa <- rosetta(autcon, roc=TRUE, clroc="autism")
ruleModelAUCa$quality
# for class: control
ruleModelAUCc <- rosetta(autcon, roc=TRUE, clroc="control")
ruleModelAUCc$quality
### set fallback class ###
#for class: autism
ruleModelFBa <- rosetta(autcon, fallBack=TRUE, fallBackClass="autism")
ruleModelFBa$quality$accuracyMean
#for class: control
ruleModelFBc <- rosetta(autcon, fallBack=TRUE, fallBackClass="control")
ruleModelFBc$quality$accuracyMean
### rules filtration ###
# accuracy
ruleModelFiltAcc <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrAccuracy=c(0, 0.85))
ruleModelFiltAcc$quality$accuracyMean
# support
ruleModelFiltSupp <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrSupport=c(1, 10))
ruleModelFiltSupp$quality$accuracyMean
# coverage
ruleModelFiltCov <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrCoverage=c(0, 0.1))
ruleModelFiltCov$quality$accuracyMean
# stability
ruleModelFiltStab <- rosetta(autcon, ruleFiltration=TRUE, ruleFiltrStability=c(1, 5))
dim(ruleModelFiltStab$main)[1]
### mask features ###
ruleModelMaskFs2 <- rosetta(autcon, maskFeatures=TRUE, maskFeaturesNames=c("MAP7", "COX2"))
ruleModelMaskFs2$quality$accuracyMean
# remove first 10 features from decision table
ruleModelMaskFs10 <- rosetta(autcon, maskFeatures=TRUE, maskFeaturesNames=colnames(autcon)[1:10])
ruleModelMaskFs10$quality$accuracyMean
### fill NA values ###
autcon2 <- autcon
#introduce 3 NA values
autcon2[2,2] <- NA
autcon2[3,3] <- NA
autcon2[4,4] <- NA
ruleModelFillNA <- rosetta(autcon2, fillNA=TRUE, fillNAmethod="meanOrMode")
ruleModelFillNA$quality$accuracyMean
### perform permutation test ###
# original data
out <- rosetta(autcon)
acc0 <- out$quality$accuracyMean
# permuted data
n_perm <- 20 # number of iterations
autcon_perm <- autcon
acc <- c()
for(i in 1:n_perm){
autcon_perm$decision <- sample(autcon_perm$decision)
out_perm <- rosetta(autcon_perm)
acc[i] <- out_perm$quality$accuracyMean
}
# visualization
hist(acc, col="lightpink", xlim=c(0,1), main="permutation test", xlab="accuracy")
abline(v = acc0, col="mediumslateblue", lwd=3, lty=2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.