View source: R/012_gaum_bor_ln.R
gaum_bor_ln | R Documentation |
Introduction of Gaussian-mixture borderline label noise into a classification dataset.
## Default S3 method: gaum_bor_ln( x, y, level, mean = c(0, 2), sd = c(sqrt(0.5), sqrt(0.5)), w = c(0.5, 0.5), k = 1, sortid = TRUE, ... ) ## S3 method for class 'formula' gaum_bor_ln(formula, data, ...)
x |
a data frame of input attributes. |
y |
a factor vector with the output class of each sample. |
level |
a double in [0,1] with the noise level to be introduced. |
mean |
a double vector with the mean for each Gaussian distribution (default: |
sd |
a double vector with the standard deviation for each Gaussian distribution (default: |
w |
a double vector with the weight for each Gaussian distribution (default: |
k |
an integer with the number of nearest neighbors to be used (default: 1). |
sortid |
a logical indicating if the indices must be sorted at the output (default: |
... |
other options to pass to the function. |
formula |
a formula with the output class and, at least, one input attribute. |
data |
a data frame in which to interpret the variables in the formula. |
Gaussian-mixture borderline label noise uses an SVM to induce the decision border
in the dataset. For each sample, its distance to the decision border is computed.
Then, a Gaussian mixture distribution with parameters (mean
, sd
) and weights w
is used to compute the value for the probability density function
associated to each distance. Finally,
(level
·100)% of the samples in the dataset are randomly selected to be mislabeled
according to their values of the probability density function. For each noisy sample, the
majority class among its k
-nearest neighbors of a different class
is chosen as the new label.
An object of class ndmodel
with elements:
xnoise |
a data frame with the noisy input attributes. |
ynoise |
a factor vector with the noisy output class. |
numnoise |
an integer vector with the amount of noisy samples per class. |
idnoise |
an integer vector list with the indices of noisy samples. |
numclean |
an integer vector with the amount of clean samples per class. |
idclean |
an integer vector list with the indices of clean samples. |
distr |
an integer vector with the samples per class in the original data. |
model |
the full name of the noise introduction model used. |
param |
a list of the argument values. |
call |
the function call. |
Noise model adapted from the papers in References, considering SVM with linear kernel as classifier, a mislabeling process using the neighborhood of noisy samples and a noise level to control the number of errors in the data.
J. Bootkrajang and J. Chaijaruwanich. Towards instance-dependent label noise-tolerant classification: a probabilistic approach. Pattern Analysis and Applications, 23(1):95-111, 2020. doi: 10.1007/s10044-018-0750-z.
gau_bor_ln
, sigb_uni_ln
, print.ndmodel
, summary.ndmodel
, plot.ndmodel
# load the dataset data(iris2D) # usage of the default method set.seed(9) outdef <- gaum_bor_ln(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)], level = 0.1) # show results summary(outdef, showid = TRUE) plot(outdef) # usage of the method for class formula set.seed(9) outfrm <- gaum_bor_ln(formula = Species ~ ., data = iris2D, level = 0.1) # check the match of noisy indices identical(outdef$idnoise, outfrm$idnoise)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.