Description Usage Arguments Details Value References Examples

View source: R/adaSvmBenchmark.R

`adaSvmBenchmark()`

allows a comparison between the performance
of an AdaSampling-enhanced SVM (support vector machine)-
classifier against the SVM-classifier on its
own. It requires a matrix of features (extracted from a labelled dataset),
and two vectors of true labels and labels with noise added as desired.
It runs an SVM classifier and returns a matrix which displays the specificity
(Sp), sensitivity (Se) and F1 score for each of four conditions:
"Original" (classifying with true labels), "Baseline" (classifying with
noisy labels), "AdaSingle" (classifying using AdaSampling) and
"AdaEnsemble" (classifying using AdaSampling in conjunction with
an ensemble of models).

1 2 | ```
adaSvmBenchmark(data.mat, data.cls, data.cls.truth, cvSeed, C = 50,
sampleFactor = 1)
``` |

`data.mat` |
a rectangular matrix or data frame that can be coerced to a matrix, containing the features of the dataset, without class labels. Rownames (possibly containing unique identifiers) will be ignored. |

`data.cls` |
a numeric vector containing class labels for the dataset
with added noise.
Must be in the same order and of the same length as |

`data.cls.truth` |
a numeric vector of true class labels for
the dataset. Must be the same order and of the same length as |

`cvSeed` |
sets the seed for cross-validation. |

`C` |
sets how many times to run the classifier, for the AdaEnsemble condition. See Description above. |

`sampleFactor` |
provides a control on the sample size for resampling. |

AdaSampling is an adaptive sampling-based noise reduction method
to deal with noisy class labelled data, which acts as a wrapper for
traditional classifiers, such as support vector machines,
k-nearest neighbours, logistic regression, and linear discriminant
analysis. For more details see `?adaSample()`

.

This function runs evaluates the AdaSampling procedure by adding noise
to a labelled dataset, and then running support vector machines on
the original and the noisy dataset. Note that this function is for
benchmarking AdaSampling performance using what is assumed to be
a well-labelled dataset. In order to run AdaSampling on a noisy dataset,
please see `adaSample()`

.

performance matrix

Yang, P., Liu, W., Yang. J. (2017) Positive unlabeled learning via wrapper-based
adaptive sampling. *International Joint Conferences on Artificial Intelligence (IJCAI)*, 3272-3279

Yang, P., Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J.(2018)
AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications.
*IEEE Transactions on Cybernetics*, doi:10.1109/TCYB.2018.2816984

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
# Load the example dataset
data(brca)
head(brca)
# First, clean up the dataset to transform into the required format.
brca.mat <- apply(X = brca[,-10], MARGIN = 2, FUN = as.numeric)
brca.cls <- sapply(X = brca$cla, FUN = function(x) {ifelse(x == "malignant", 1, 0)})
rownames(brca.mat) <- paste("p", 1:nrow(brca.mat), sep="_")
# Introduce 40% noise to positive class and 30% noise to the negative class
set.seed(1)
pos <- which(brca.cls == 1)
neg <- which(brca.cls == 0)
brca.cls.noisy <- brca.cls
brca.cls.noisy[sample(pos, floor(length(pos) * 0.4))] <- 0
brca.cls.noisy[sample(neg, floor(length(neg) * 0.3))] <- 1
# benchmark classification performance with different approaches
adaSvmBenchmark(data.mat = brca.mat, data.cls = brca.cls.noisy, data.cls.truth = brca.cls, cvSeed=1)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.