cfPermute: Permutation testing to indicate statistical significance of...

Description Usage Arguments Details Value References See Also Examples

View source: R/cfPermute.R

Description

The cfPermute function performs permutation testing on a classification ensemble produced by cfBuild. This is essentially a comparison between the classification performance achieved for a given dataset and the performance that would be achieved by random chance. It therefore provides an indication of significance of the performance of a classifier.

Usage

1
2
3
cfPermute(inputData, inputClass, bootNum = 100, ensNum = 100, permNum = 100, 
          parallel = TRUE, cpus = NULL, type = "SOCK", socketHosts = NULL, 
          progressBar = TRUE, scaling = TRUE)

Arguments

inputData

The input data matrix as provided by the user (mandatory field).

inputClass

The input class vector as provided by the user (mandatory field).

bootNum

The number of bootstrap iterations during the optimisation process. By default, the value is set to 100.

ensNum

The number of classifiers that constitute the ensemble for each permutation. By default, the value is set to 100.

permNum

The number of permutations to be executed. By default, the value is set to 100.

parallel

Boolean value that determines parallel or sequential execution. By default set to TRUE. For more details, see sfInit.

cpus

Numeric value that provides the number of CPUs requested for the cluster. For more details, see sfInit.

type

The type of cluster. It can take the values ‘SOCK’, ‘MPI’, ‘PVM’ or ‘NWS’. By default, type is equal to ‘SOCK’. For more details, see sfInit.

socketHosts

Host list for socket clusters. Only needed for socketmode (SOCK) and if using more than one machines (if using only your local machine (localhost) no list is needed). For more details, see sfInit.

progressBar

Boolean value that determines whether a progress bar should be displayed. By default set to TRUE.

scaling

Boolean value that determines whether scaling should be applied (by default set to TRUE). Data are scaled internally, usually yielding better results. The parameters of SVM-models usually must be tuned to yield sensible results. For more information, see function svm.

Details

Permutation testing is a widely-applied process used in order to provide an indication of the statistical significance of the classification results. In a permutation test, the entries of the original class vector (inputClass) are randomly shuffled, while the class distribution is preserved. This approach destroys all the sample membership information since the samples of a permuted dataset correspond to randomly assigned classes. The whole model building process as described in cfBuild is once more repeated for the "false" (permuted) classes. In general, permutation testing should be performed at least 100 times (default value of permNum) until a stable distribution of results is obtained.

Value

The cfPermute function returns an object in the form of an R list. The attributes of the list can be accessed by executing the attributes command. More specifically, the list of attributes includes:

avgAcc

The average test accuracy across all ensembles within each permutation iteration.

totalTime

The overall execution time of permutation testing.

execTime

The individual execution times for each permutation round.

permList

For each permutation iteration, a new object (list) is generated by the function cfBuild using as input the initial data and the permuted class. This attribute will have the same length - the same number of elements - as the permNum attribute specified in the cfPermute function. For more information on the arguments of the object, see cfBuild

References

Good, P. I.
Permutation, Parametric and Bootstrap Tests of Hypotheses
3rd ed, Springer-Verlag New York Inc, Dordrecht, 2006

Hesterberg, T., Moore, D. S., Monaghan, S., Clipson, A. and Epstein, R. Bootstrap methods and permutation tests
Introduction to the Practice of Statistics, vol. 5, pp. 1-70, 2005

See Also

getPerm5Num, ggPermHist

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Not run: 
data(iris)

irisClass <- iris[,5]
irisData  <- iris[,-5]
            
ens <- cfBuild(irisData, irisClass, bootNum = 100, ensNum = 100, parallel = TRUE, 
               cpus = 4, type = "SOCK")

# Execute 5 permutation rounds; in each permutation test, an ensemble of 20 classifiers 
# is constructed, each running 10 bootstrap iterations during the optimization process
# The default values for permutation testing are ensNum = bootNum = permNum = 100

permObj <- cfPermute(irisData, irisClass, bootNum = 10, ensNum = 20, permNum = 5, parallel = TRUE, 
                     cpus = 4, type = "SOCK")

# List of attributes for each permutation
attributes(permObj)

# Get the vector of averaged accuracies, one for each permutation 
# (each permutation is an independent classification ensemble)
permObj$avgAcc

# Get the overall elapsed time for the permutation process 
permObj$totalTime[3]

# Get the vector of individual execution times for each permutation
permObj$execTime

# Access the first ensemble in the permutation list
permObj$permList[[1]]

## End(Not run)

classyfire documentation built on May 29, 2017, 11:05 p.m.