ComputeInterestingTuplesDiscrete: Interesting tuples (discrete)

View source: R/tuples.R

ComputeInterestingTuplesDiscreteR Documentation

Interesting tuples (discrete)

Description

Interesting tuples (discrete)

Usage

ComputeInterestingTuplesDiscrete(
  data,
  decision = NULL,
  dimensions = 2,
  pc.xi = 0.25,
  ig.thr = 0,
  I.lower = NULL,
  interesting.vars = vector(mode = "integer"),
  require.all.vars = FALSE,
  return.matrix = FALSE,
  stat_mode = "MI"
)

Arguments

data

input data where columns are variables and rows are observations (all discrete with the same number of categories)

decision

decision variable as a binary sequence of length equal to number of observations

dimensions

number of dimensions (a positive integer; 5 max)

pc.xi

parameter xi used to compute pseudocounts (the default is recommended not to be changed)

ig.thr

IG threshold above which the tuple is interesting (0 and negative mean no filtering)

I.lower

IG values computed for lower dimension (1D for 2D, etc.)

interesting.vars

variables for which to check the IGs (none = all)

require.all.vars

boolean whether to require tuple to consist of only interesting.vars

return.matrix

boolean whether to return a matrix instead of a list (ignored if not using the optimised method variant)

stat_mode

character, one of: "MI" (mutual information, the default; becomes information gain when decision is given), "H" (entropy; becomes conditional entropy when decision is given), "VI" (variation of information; becomes target information difference when decision is given); decides on the value computed

Details

If running in 2D and no filtering is applied, this function is able to run in an optimised fashion. It is recommended to avoid filtering in 2D if only it is feasible.

This function calculates what stat_mode dictates. When decision is omitted, the stat_mode is calculated on the descriptive variables. When decision is given, the stat_mode is calculated on the decision variable, conditional on the other variables. Translate "IG" to that value in the rest of this function's description.

Value

A data.frame or NULL (following a warning) if no tuples are found.

The following columns are present in the data.frame:

  • Var – interesting variable index

  • Tuple.1, Tuple.2, ... – corresponding tuple (up to dimensions columns)

  • IG – information gain achieved by var in Tuple.*

Additionally attribute named run.params with run parameters is set on the result.

Examples


ig.1d <- ComputeMaxInfoGainsDiscrete(madelon$data > 500, madelon$decision, dimensions = 1)
ComputeInterestingTuplesDiscrete(madelon$data > 500, madelon$decision, dimensions = 2,
                                 ig.thr = 100, I.lower = ig.1d$IG)


MDFS documentation built on May 31, 2023, 7:31 p.m.