View source: R/information_gain.R
information_gain | R Documentation |
Algorithms that find ranks of importance of discrete attributes, basing on their entropy with a continous class attribute. This function is a reimplementation of FSelector's information.gain, gain.ratio and symmetrical.uncertainty.
information_gain(
formula,
data,
x,
y,
type = c("infogain", "gainratio", "symuncert"),
equal = FALSE,
discIntegers = TRUE,
nbins = 5,
threads = 1
)
formula |
An object of class formula with model description. |
data |
A data.frame accompanying formula. |
x |
A data.frame or sparse matrix with attributes. |
y |
A vector with response variable. |
type |
Method name. |
equal |
A logical. Whether to discretize dependent variable with the
|
discIntegers |
logical value. If true (default), then integers are treated as numeric vectors and they are discretized. If false integers are treated as factors and they are left as is. |
nbins |
Number of bins used for discretization. Only used if 'equal = TRUE' and the response is numeric. |
threads |
defunct. Number of threads for parallel backend - now turned off because of safety reasons. |
type = "infogain"
is
H(Class) + H(Attribute) - H(Class,
Attribute)
type = "gainratio"
is
\frac{H(Class) + H(Attribute) - H(Class,
Attribute)}{H(Attribute)}
type = "symuncert"
is
2\frac{H(Class) + H(Attribute) - H(Class,
Attribute)}{H(Attribute) + H(Class)}
where H(X) is Shannon's Entropy for a variable X and H(X, Y) is a joint Shannon's Entropy for a variable X with a condition to Y.
data.frame with the following columns:
attributes - variables names.
importance - worth of the attributes.
Zygmunt Zawadzki zygmunt@zstat.pl
irisX <- iris[-5]
y <- iris$Species
## data.frame interface
information_gain(x = irisX, y = y)
# formula interface
information_gain(formula = Species ~ ., data = iris)
information_gain(formula = Species ~ ., data = iris, type = "gainratio")
information_gain(formula = Species ~ ., data = iris, type = "symuncert")
# sparse matrix interface
if(require("Matrix")) {
library(Matrix)
i <- c(1, 3:8); j <- c(2, 9, 6:10); x <- 7 * (1:7)
x <- sparseMatrix(i, j, x = x)
y <- c(1, 1, 1, 1, 2, 2, 2, 2)
information_gain(x = x, y = y)
information_gain(x = x, y = y, type = "gainratio")
information_gain(x = x, y = y, type = "symuncert")
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.