criteria.after.split.calculator: Calculates Entropy or Gini Index of a node after a given...
In forestRK: Implements the Forest-R.K. Algorithm for Classification Problems

Description Usage Arguments Value Author(s) See Also Examples

View source: R/criteria.after.split.calculator.R

Calculates Entropy or Gini Index of a particular node after a particular split; this function is called within construct.treeRK function.

The argument split.record is a kidids_split object from the package partykit; the method kidids_split splits the data according to the criteria specified by an user ahead of time, and returns a vector storing the index of the split group (group "1" or "2") that each observation from the original data in question belongs to after the split has occurred.

For more information about the function, please see the partykit documentation.

1
2
3

 criteria.after.split.calculator(x.node = data.frame(), y.new.node = c(),
                                 split.record = kidids_split(),
                                 entropy = TRUE)

`x.node`	numericized data frame of covariates (obtained via `x.organizer()`) from a particular node that is to be split; `x.node` should contain no `NA` or `NaN`'s.
`y.new.node`	numericized class type of each observation from a particular node that is to be split; `y.new.node` should contain no `NA` or`NaN`'s.
`split.record`	output of the `kidids_split` function from the `partykit` package that describes a particular split.
`entropy`	`TRUE` if Entropy is used as the splitting criteria; `FALSE` if Gini Index is used instead. Default is set to `TRUE`.

The value of Entropy or Gini Index of a particular node after a particular split.

Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca

criteria.calculator

  ## example: iris dataset
  library(forestRK) # load the package forestRK
  library(partykit)

  # covariates of training data set
  x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
  # numericized class types of observations of training dataset
  y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new
  ## criteria.after.split.calculator() example in the implementation
  ## of the forestRK algorithm

  ent.status <- TRUE

  # number.of.columns.of.x.node
  # = total number of covariates that we consider
  number.of.columns.of.x.node <- dim(x.train)[2]
  # m.try = the randomly chosen number of covariates that we consider
  # at the time of split
  m.try <- sample(1:(number.of.columns.of.x.node),1)
  ## sample m.try number of covariates from the list of all covariates
  K <- sample(1:(number.of.columns.of.x.node), m.try)

  # split the data
  # (the choice of the type of split used here is only arbitrary)
  # for more information about kidids_split,
  # please refer to the documentation for the package 'partykit'
  sp <- partysplit(varid=K[1], breaks = x.train[1,K[1]], index = NULL,
                   right = TRUE, prob = NULL, info = NULL)
  split.record <- kidids_split(sp, data=x.train)

  # implement critera.after.split function based on kidids_split object
  criteria.after.split <- criteria.after.split.calculator(x.train,
                                    y.train, split.record, ent.status)
  criteria.after.split