View source: R/063_sym_sgau_an.R
sym_sgau_an | R Documentation |
Introduction of Symmetric scaled-Gaussian attribute noise into a classification dataset.
## Default S3 method: sym_sgau_an(x, y, level, k = 0.2, sortid = TRUE, ...) ## S3 method for class 'formula' sym_sgau_an(formula, data, ...)
x |
a data frame of input attributes. |
y |
a factor vector with the output class of each sample. |
level |
a double in [0,1] with the noise level to be introduced. |
k |
a double in [0,1] with the scale used for the standard deviation (default: 0.2). |
sortid |
a logical indicating if the indices must be sorted at the output (default: |
... |
other options to pass to the function. |
formula |
a formula with the output class and, at least, one input attribute. |
data |
a data frame in which to interpret the variables in the formula. |
Symmetric scaled-Gaussian attribute noise corrupts (level
·100)% of the values of
each attribute in the dataset. In order to corrupt an attribute A, (level
·100)% of the
samples in the dataset are chosen. Then, their values for A are modified adding a random value
that follows a Gaussian distribution of mean = 0 and standard deviation = (max-min)·k
·level
, being
max and min the limits of the attribute domain. For nominal attributes, a random value is chosen.
An object of class ndmodel
with elements:
xnoise |
a data frame with the noisy input attributes. |
ynoise |
a factor vector with the noisy output class. |
numnoise |
an integer vector with the amount of noisy samples per attribute. |
idnoise |
an integer vector list with the indices of noisy samples per attribute. |
numclean |
an integer vector with the amount of clean samples per attribute. |
idclean |
an integer vector list with the indices of clean samples per attribute. |
distr |
an integer vector with the samples per class in the original data. |
model |
the full name of the noise introduction model used. |
param |
a list of the argument values. |
call |
the function call. |
Noise model adapted from the papers in References.
M. Koziarski, B. Krawczyk, and M. Wozniak. Radial-based oversampling for noisy imbalanced data classification. Neurocomputing, 343:19–33, 2019. doi: 10.1016/j.neucom.2018.04.089.
sym_sgau_an
, sym_gau_an
, print.ndmodel
, summary.ndmodel
, plot.ndmodel
# load the dataset data(iris2D) # usage of the default method set.seed(9) outdef <- sym_sgau_an(x = iris2D[,-ncol(iris2D)], y = iris2D[,ncol(iris2D)], level = 0.1) # show results summary(outdef, showid = TRUE) plot(outdef) # usage of the method for class formula set.seed(9) outfrm <- sym_sgau_an(formula = Species ~ ., data = iris2D, level = 0.1) # check the match of noisy indices identical(outdef$idnoise, outfrm$idnoise)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.