information_gain: Entropy-based Filters

View source: R/information_gain.R

information_gainR Documentation

Entropy-based Filters

Description

Algorithms that find ranks of importance of discrete attributes, basing on their entropy with a continous class attribute. This function is a reimplementation of FSelector's information.gain, gain.ratio and symmetrical.uncertainty.

Usage

information_gain(
  formula,
  data,
  x,
  y,
  type = c("infogain", "gainratio", "symuncert"),
  equal = FALSE,
  discIntegers = TRUE,
  nbins = 5,
  threads = 1
)

Arguments

formula

An object of class formula with model description.

data

A data.frame accompanying formula.

x

A data.frame or sparse matrix with attributes.

y

A vector with response variable.

type

Method name.

equal

A logical. Whether to discretize dependent variable with the equal frequency binning discretization or not.

discIntegers

logical value. If true (default), then integers are treated as numeric vectors and they are discretized. If false integers are treated as factors and they are left as is.

nbins

Number of bins used for discretization. Only used if 'equal = TRUE' and the response is numeric.

threads

defunct. Number of threads for parallel backend - now turned off because of safety reasons.

Details

type = "infogain" is

H(Class) + H(Attribute) - H(Class, Attribute)

type = "gainratio" is

\frac{H(Class) + H(Attribute) - H(Class, Attribute)}{H(Attribute)}

type = "symuncert" is

2\frac{H(Class) + H(Attribute) - H(Class, Attribute)}{H(Attribute) + H(Class)}

where H(X) is Shannon's Entropy for a variable X and H(X, Y) is a joint Shannon's Entropy for a variable X with a condition to Y.

Value

data.frame with the following columns:

  • attributes - variables names.

  • importance - worth of the attributes.

Author(s)

Zygmunt Zawadzki zygmunt@zstat.pl

Examples


irisX <- iris[-5]
y <- iris$Species

## data.frame interface
information_gain(x = irisX, y = y)

# formula interface
information_gain(formula = Species ~ ., data = iris)
information_gain(formula = Species ~ ., data = iris, type = "gainratio")
information_gain(formula = Species ~ ., data = iris, type = "symuncert")

# sparse matrix interface
if(require("Matrix")) {
  library(Matrix)
  i <- c(1, 3:8); j <- c(2, 9, 6:10); x <- 7 * (1:7)
  x <- sparseMatrix(i, j, x = x)
  y <- c(1, 1, 1, 1, 2, 2, 2, 2)

  information_gain(x = x, y = y)
  information_gain(x = x, y = y, type = "gainratio")
  information_gain(x = x, y = y, type = "symuncert")
}



FSelectorRcpp documentation built on Oct. 3, 2024, 1:08 a.m.