Description Usage Arguments Details Value References See Also Examples
The cfPermute
function performs permutation testing on a classification ensemble produced by cfBuild
. This is essentially a comparison between the classification performance achieved for a given dataset and the performance that would be achieved by random chance. It therefore provides an indication of significance of the performance of a classifier.
1 2 3 |
inputData |
The input data matrix as provided by the user (mandatory field). |
inputClass |
The input class vector as provided by the user (mandatory field). |
bootNum |
The number of bootstrap iterations during the optimisation process. By default, the value is set to 100. |
ensNum |
The number of classifiers that constitute the ensemble for each permutation. By default, the value is set to 100. |
permNum |
The number of permutations to be executed. By default, the value is set to 100. |
parallel |
Boolean value that determines parallel or sequential execution. By default set to |
cpus |
Numeric value that provides the number of CPUs requested for the cluster. For more details, see sfInit. |
type |
The type of cluster. It can take the values ‘SOCK’, ‘MPI’, ‘PVM’ or ‘NWS’. By default, type is equal to ‘SOCK’. For more details, see sfInit. |
socketHosts |
Host list for socket clusters. Only needed for socketmode (SOCK) and if using more than one machines (if using only your local machine (localhost) no list is needed). For more details, see sfInit. |
progressBar |
Boolean value that determines whether a progress bar should be displayed. By default set to |
scaling |
Boolean value that determines whether scaling should be applied (by default set to TRUE). Data are scaled internally, usually yielding better results. The parameters of SVM-models usually must be tuned to yield sensible results. For more information, see function svm. |
Permutation testing is a widely-applied process used in order to provide an indication of the statistical significance of the classification results. In a permutation test, the entries of the original class vector (inputClass
) are randomly shuffled, while the class distribution is preserved. This approach destroys all the sample membership information since the samples of a permuted dataset correspond to randomly assigned classes. The whole model building process as described in cfBuild
is once more repeated for the "false" (permuted) classes. In general, permutation testing should be performed at least 100 times (default value of permNum
) until a stable distribution of results is obtained.
The cfPermute
function returns an object in the form of an R list. The attributes of the list can be accessed by executing the attributes command. More specifically, the list of attributes includes:
avgAcc |
The average test accuracy across all ensembles within each permutation iteration. |
totalTime |
The overall execution time of permutation testing. |
execTime |
The individual execution times for each permutation round. |
permList |
For each permutation iteration, a new object (list) is generated by the function |
Good, P. I.
Permutation, Parametric and Bootstrap Tests of Hypotheses
3rd ed, Springer-Verlag New York Inc, Dordrecht, 2006
Hesterberg, T., Moore, D. S., Monaghan, S., Clipson, A. and Epstein, R.
Bootstrap methods and permutation tests
Introduction to the Practice of Statistics, vol. 5, pp. 1-70, 2005
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ## Not run:
data(iris)
irisClass <- iris[,5]
irisData <- iris[,-5]
ens <- cfBuild(irisData, irisClass, bootNum = 100, ensNum = 100, parallel = TRUE,
cpus = 4, type = "SOCK")
# Execute 5 permutation rounds; in each permutation test, an ensemble of 20 classifiers
# is constructed, each running 10 bootstrap iterations during the optimization process
# The default values for permutation testing are ensNum = bootNum = permNum = 100
permObj <- cfPermute(irisData, irisClass, bootNum = 10, ensNum = 20, permNum = 5, parallel = TRUE,
cpus = 4, type = "SOCK")
# List of attributes for each permutation
attributes(permObj)
# Get the vector of averaged accuracies, one for each permutation
# (each permutation is an independent classification ensemble)
permObj$avgAcc
# Get the overall elapsed time for the permutation process
permObj$totalTime[3]
# Get the vector of individual execution times for each permutation
permObj$execTime
# Access the first ensemble in the permutation list
permObj$permList[[1]]
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.