mergeFactors.formula: mergeFactors.formula

Description Usage Arguments


Method for mergeFactors() when first argument is a formula.


## S3 method for class 'formula'
mergeFactors(response, factor, ..., data = NULL,
  weights = NULL, family = "gaussian", method = "fast-adaptive",
  abbreviate = TRUE)



Formula containing columns names from the data argument.


A factor vector when we use response argument, otherwise the name of column from data argument containing which levels should be merged.


Other arguments corresponding to type of first argument/


A data frame to be used for modeling


A weights vector, optional when we use response argument. For more information see: lm, glm, coxph


Model family to be used in merging. Available models are: "gaussian", "survival", "binomial". By default mergeFactors uses "gaussian" model.


A string specifying method used during merging. Four methods are available:

  • method = "adaptive". The objective function that is maximized throughout procedure is the logarithm of likelihood. The set of pairs enabled to merge contains all possible pairs of groups available in a given step. Pairwise LRT distances are recalculated every step. This option is the slowest one since it requires the largest number of comparisons. It requires O(k^3) model evaluations. (with k - the initial number of groups)

  • method = "fast-adaptive". For Gaussian family of response, at the very beginning, the groups are ordered according to increasing averages and then the set of pairs compared contains only pairs of closest groups. For other families the order corresponds to beta coefficients in a regression model. This option is much faster than method = "adaptive" and requires O(k^2) model evaluations.

  • method = "fixed". This option is based on the DMR algorithm introduced in Proch. It was extended to cover survival models. The largest difference between this option and the method = "adaptive" is, that in the first step a pairwise distances are calculated between each groups based on the LRT statistic. Then the agglomerative clustering algorithm is used to merge consecutive pairs. It means that pairwise model differences are not recalculated as LRT statistics in every step but the complete linkage is used instead. This option is very fast and requires O(k^2) comparisons.

  • method = "fast-fixed". This option may be considered as a modification of method = "fixed". Here, similarly as in the fast-adaptive version, we assume that if groups A, B and C are sorted according to their increasing beta coefficients, then the distance between groups A and B and the distance between groups B and C are not greater than the distance between groups A and C. This assumption enables to implement the complete linkage clustering more efficiently in a dynamic manner. The biggest difference is that in the first step we do not calculated whole matrix of pairwise differences, but instead only the differences between consecutive groups. Then in each step a only single distance is calculated. This helps to reduce the number of model evaluations to O(n).

The default option is "fast-adaptive".


Logical. If TRUE, the default, factor levels names are abbreviated.

factorMerger documentation built on July 4, 2019, 1:02 a.m.