chaid: CHi-squared Automated Interaction Detection
In CHAID: CHi-squared Automated Interaction Detection

Description Usage Arguments Details Value References Examples

View source: R/chaid.R

Fits a classification tree by the CHAID algorithm.

chaid(formula, data, subset, weights, na.action = na.omit, 
      control = chaid_control())
chaid_control(alpha2 = 0.05, alpha3 = -1, alpha4 = 0.05,
              minsplit = 20, minbucket = 7, minprob = 0.01,
              stump = FALSE, maxheight = -1)

`formula`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the model to be fitted. Both response and all covariates are assumed to be categorical (either ordered or not).
`data`	an optional data frame containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `chaid` is called.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`weights`	an optional vector of weights to be used in the fitting process. Should be `NULL` or a numeric vector.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is `na.omit`.
`control`	hyper parameters of the algorithm as returned by `chaid_control`.
`alpha2`	Level of significance used for merging of predictor categories (step 2).
`alpha3`	If set to a positive value $< 1$, level of significance used for the the splitting of former merged categories of the predictor (step 3). Otherwise, step 3 is omitted (the default).
`alpha4`	Level of significance used for splitting of a node in the most significant predictor (step 5).
`minsplit`	Number of observations in splitted response at which no further split is desired.
`minbucket`	Minimum number of observations in terminal nodes.
`minprob`	Mininimum frequency of observations in terminal nodes.
`stump`	only root node splits are performed.
`maxheight`	Maximum height for the tree.

The current implementation only accepts nominal or ordinal categorical predictors. When predictors are continuous, they have to be transformed into ordinal predictors before using the following algorithm.

Merging: For each predictor variable X in turn, merge non-significant categories. Each final category of X will result in one child node if X is used to split the node. The merging step also calculates the adjusted p-value that is to be used in the splitting step.

1. If X has 1 category only, stop and set the adjusted p-value to be 1.

2. If X has 2 categories, go to step 8.

3. Else, find the allowable pair of categories of X (an allowable pair of categories for ordinal predictor is two adjacent categories, and for nominal predictor is any two categories) that is least significantly different (i.e., most similar). The most similar pair is the pair whose test statistic gives the largest p-value with respect to the dependent variable Y. How to calculate p-value under various situations will be described in later sections.

4. For the pair having the largest p-value, check if its p-value is larger than a user-specified alpha-level alpha2. If it does, this pair is merged into a single compound category. Then a new set of categories of X is formed. If it does not, then go to step 7.

5. (Optional) If the newly formed compound category consists of three or more original categories, then find the best binary split within the compound category which p-value is the smallest. Perform this binary split if its p-value is not larger than an alpha-level alpha3.

6. Go to step 2.

7. (Optional) Any category having too few observations (as compared with a user-specified minimum segment size) is merged with the most similar other category as measured by the largest of the p-values.

8. The adjusted p-value is computed for the merged categories by applying Bonferroni adjustments that are to be discussed later.

Splitting: The best split for each predictor is found in the merging step. The splitting step selects which predictor to be used to best split the node. Selection is accomplished by comparing the adjusted p-value associated with each predictor. The adjusted p-value is obtained in the merging step. 1. Select the predictor that has the smallest adjusted p-value (i.e., most significant). 2. If this adjusted p-value is less than or equal to a user-specified alpha-level alpha4, split the node using this predictor. Else, do not split and the node is considered as a terminal node.

Stopping: The stopping step checks if the tree growing process should be stopped according to the following stopping rules. a) If a node becomes pure; that is, all cases in a node have identical values of the dependent variable, the node will not be split. b) If all cases in a node have identical values for each predictor, the node will not be split. c) If the current tree depth reaches the user specified maximum tree depth limit value, the tree growing process will stop. d) If the size of a node is less than the user-specified minimum node size value, the node will not be split. e) If the split of a node results in a child node whose node size is less than the user-specified minimum child node size value, child nodes that have too few cases (as compared with this minimum) will merge with the most similar child node as measured by the largest of the p-values. However, if the resulting number of child nodes is 1, the node will not be split. f) If the trees height is a positive value and equals the maxheight.

An object of class constparty, see package party.

G. V. Kass (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 29(2), 119–127.

  library("CHAID")

  ### fit tree to subsample
  set.seed(290875)
  USvoteS <- USvote[sample(1:nrow(USvote), 1000),]

  ctrl <- chaid_control(minsplit = 200, minprob = 0.1)
  chaidUS <- chaid(vote3 ~ ., data = USvoteS, control = ctrl)

  print(chaidUS)
  plot(chaidUS)