Classyfire Cheat Sheet

%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{Classyfire Cheat Sheet}

General Package Handling

Install from CRAN

install.packages("classyfire")

Load the classyfire package within R

library(classyfire)

Get the classyfire help overview

??classyfire

Building a classification ensemble

Loading some test data, for instance the iris dataset

data(iris)

irisClass <- iris[,5]
irisData  <- iris[,-5]

Construct a classification ensemble in parallel (using 4 cpus in this instance) that consists of 10 independent classification models (classifiers) optimised using 10 bootstrap iterations

ens <- cfBuild(inputData = irisData, inputClass = irisClass, bootNum = 10, ensNum = 10,
               parallel = TRUE, cpus = 4, type = "SOCK")

Similarly, in sequence:

ens <- cfBuild(inputData = irisData, inputClass = irisClass, bootNum = 10, ensNum = 10,
               parallel = FALSE)

The list of attributes available for each classifier in the ensemble is provided by the function:

attributes(ens)

Get the overall average test and train accuracy

getAvgAcc(ens)$Test
getAvgAcc(ens)$Train

Get the individual test and train accuracies in the ensemble

ens$testAcc  
ens$trainAcc

# Alternatively

getAcc(ens)$Test
getAcc(ens)$Train

Testing new unknown data

In this instance, we are going to randomly generate test data (that represent a new input dataset of unknown classes) to find out their classes using the generated ensemble. The new dataset must have exactly the same number of columns as the inputData, passed as an argument in cfBuild. In the following example, 400 points are selected at random, which results in 100 samples (rows).

testMatr <- matrix(runif(400)*100, ncol = ncol(irisData))           
predRes  <- cfPredict(ens, testMatr)

Determining statistical significance by permutation testing

Execute five permutation rounds; in each permutation test, an ensemble of 10 classifiers is constructed, each running 10 bootstrap iterations during the optimization process. The default values for permutation testing are ensNum, bootNum and permNum equal to 100.

permObj <- cfPermute(irisData, irisClass, bootNum = 10, ensNum = 10, permNum = 5, 
                     parallel = TRUE, cpus = 4, type = "SOCK")

Get the vector of averaged accuracies, one for each permutation (each permutation is an independent classification ensemble)

permObj$avgAcc

Get the overall elapsed time for the permutation process, and the vector of individual execution times for each permutation respectively

permObj$totalTime[3]
permObj$execTime

Access the first ensemble in the permutation list

permObj$permList[[1]]

Evaluating the classification ensemble

All the functions for descriptive statistics within classyfire start with the prefix "get". For example:

Get the average test and/or train accuracy of the ensemble

getAvgAcc(ens)
getAvgAcc(ens)$Test
getAvgAcc(ens)$Train

Get the vectors of test and/or train accuracies of the classifiers in the ensemble

getAcc(ens)
getAcc(ens)$Test
getAcc(ens)$Train

Get the confusion matrix summarising the performance of the ensemble

getConfMatr(ens)

Get the optimal SVM hyperparameters of the classification ensemble

optParam <- getOptParam(ens)
optParam

Return the "five number summary", a descriptive statistic that consists of the minimum, first (lower) quartile, median, third (upper) quartile and maximum value of a given distribution. In this case, the function is applied directly on the output of permutation testing, generated by the cfPermute function.

getPerm5Num(permObj)
getPerm5Num(permObj)$median      
getPerm5Num(permObj)$minimum
getPerm5Num(permObj)$maximum
getPerm5Num(permObj)$upperQ
getPerm5Num(permObj)$lowerQ

Plotting functions within classyfire

All the functions for plotting within classyfire start with the prefix "gg" since the library ggplot2 is in use. For example:

The ggClasPred function generates a barplot with the per class accuracies (%) for all the correctly classified and misclassified samples in the classification ensemble.

# Show the percentages of correctly classified samples in 
# a barplot with or without text respectively

ggClassPred(ens)
ggClassPred(ens, showText = TRUE)

# Show the percentages of classified and missclassified samples
# in a barplot simultaneously with and without text

ggClassPred(ens, displayAll = TRUE)
ggClassPred(ens, position = "stack", displayAll = TRUE)
ggClassPred(ens, position = "stack", displayAll = TRUE, showText = TRUE)

# Alernatively, using a dodge position
ggClassPred(ens, position = "dodge", displayAll = TRUE)
ggClassPred(ens, position = "dodge", displayAll = TRUE, showText = TRUE)

The ggEnsTrend function displays the average test accuracies for every new classifier added to the ensemble, as constructed by the cfBuild function.

ggEnsTrend(ens)

# Plot with text 
ggEnsTrend(ens, showText  = TRUE)

# Plot with text; set different limits on y axis 
ggEnsTrend(ens, showText  = TRUE, ylims=c(90, 100))

The ggEnsHist function generates a histogram of the ensemble results as generated by cfBuild.

ggEnsHist(ens)

# Density plot of the test accuracies in the ensemble
ggEnsHist(ens, density = TRUE)

# Density plot that highlights additional descriptive statistics
ggEnsHist(ens, density = TRUE, percentiles=TRUE)
ggEnsHist(ens, density = TRUE, percentiles=TRUE, mean=TRUE)
ggEnsHist(ens, density = TRUE, percentiles=TRUE, median=TRUE)

The ggPermHist function generates a histogram of the permutation results as generated by cfPermute.

ggPermHist(permObj)

# Density plot 
ggPermHist(permObj, density=TRUE)

# Density plot that highlights additional descriptive statistics
ggPermHist(permObj, density=TRUE, percentiles = TRUE, mean = TRUE)
ggPermHist(permObj, density=TRUE, percentiles = TRUE, median = TRUE)

Finally, the ggFusedHist function generates a histogram for simultaneous visual comparison of the classification and permutation distributions.

ggFusedHist(ensObj, permObj)


Try the classyfire package in your browser

Any scripts or data that you put into this service are public.

classyfire documentation built on May 29, 2017, 11:05 p.m.