View source: R/009_gam_bor_ln.R
| gam_bor_ln | R Documentation | 
Introduction of Gamma borderline label noise into a classification dataset.
## Default S3 method: gam_bor_ln(x, y, level, shape = 1, rate = 0.5, k = 1, sortid = TRUE, ...) ## S3 method for class 'formula' gam_bor_ln(formula, data, ...)
x | 
 a data frame of input attributes.  | 
y | 
 a factor vector with the output class of each sample.  | 
level | 
 a double in [0,1] with the noise level to be introduced.  | 
shape | 
 a double with the shape for the gamma distribution (default: 1)  | 
rate | 
 a double with the rate for the gamma distribution (default: 0.5).  | 
k | 
 an integer with the number of nearest neighbors to be used (default: 1).  | 
sortid | 
 a logical indicating if the indices must be sorted at the output (default:   | 
... | 
 other options to pass to the function.  | 
formula | 
 a formula with the output class and, at least, one input attribute.  | 
data | 
 a data frame in which to interpret the variables in the formula.  | 
Gamma borderline label noise uses an SVM to induce the decision border 
in the dataset. For each sample, its distance
to the decision border is computed. 
Then, a gamma distribution with parameters (shape, rate) is used to compute the
value for the probability density function associated to each distance. 
Finally, (level·100)% of the samples in the dataset are randomly selected to be mislabeled
according to their values of the probability density function. For each noisy sample, the 
majority class among its k-nearest neighbors of a different class 
is chosen as the new label.
An object of class ndmodel with elements:
xnoise | 
 a data frame with the noisy input attributes.  | 
ynoise | 
 a factor vector with the noisy output class.  | 
numnoise | 
 an integer vector with the amount of noisy samples per class.  | 
idnoise | 
 an integer vector list with the indices of noisy samples.  | 
numclean | 
 an integer vector with the amount of clean samples per class.  | 
idclean | 
 an integer vector list with the indices of clean samples.  | 
distr | 
 an integer vector with the samples per class in the original data.  | 
model | 
 the full name of the noise introduction model used.  | 
param | 
 a list of the argument values.  | 
call | 
 the function call.  | 
Noise model adapted from the papers in References, considering SVM with linear kernel as classifier, a mislabeling process using the neighborhood of noisy samples and a noise level to control the number of errors in the data.
J. Bootkrajang. A generalised label noise model for classification. In Proc. 23rd European Symposium on Artificial Neural Networks, pages 349-354, 2015. url:https://dblp.org/rec/conf/esann/Bootkrajang15.html?view=bibtex.
exp_bor_ln, pmd_con_ln, print.ndmodel, summary.ndmodel, plot.ndmodel
# load the dataset data(iris2D) # usage of the default method set.seed(9) outdef <- gam_bor_ln(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)], level = 0.1) # show results summary(outdef, showid = TRUE) plot(outdef) # usage of the method for class formula set.seed(9) outfrm <- gam_bor_ln(formula = Species ~ ., data = iris2D, level = 0.1) # check the match of noisy indices identical(outdef$idnoise, outfrm$idnoise)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.