gamtree: Recursively partition a dataset based on penalized GAMs.
In marjoleinF/gamtree: GAM-based Recursive Partitioning

gamtree

R Documentation

Recursively partition a dataset based on penalized GAMs.

Description

gamtree recursively partitions a dataset into subgroups with penalized GAMs, characterized by differences in the parameter estimates.

Usage

gamtree(
  formula,
  data,
  weights = NULL,
  REML = TRUE,
  method = "mob",
  cluster = NULL,
  offset = NULL,
  verbose = FALSE,
  parm = c(1, 2, 4),
  gam_ctrl = list(),
  tree_ctrl = list(),
  alt_formula = NULL,
  ...
)

Arguments

`formula`	specifies the model formula, consisting of three parts: the response variable followed by a tilde ('~'); the terms for the node-specific GAMs, followed by a vertical bar ('\|') and the potential partitioning variables (separated by a '+'). The 'by' argument of function `s` may NOT be used in the node-specific GAM formulation. Refrain from using the dot ('.') to specify all remaining variables in `data`, this may yield unexpected results; make sure to specify each variable in the corresponding part of the model formula. See Examples.
`data`	`data.frame` containing the variables specified in `formula`.
`weights`	numeric vector of length `nrow(data)`; optional case weights. A weight of 2, for example, is equivalent to having made exactly the same observation twice.
`REML`	logical, defaults to `TRUE`. Passed on to 'gamm4' and in turn 'lmer' (but not 'glmer') fitting routines to control whether REML or ML estimation is used.
`method`	character, one of `"ctree"` or `"mob"`, indicates which partitioning algorithm should be used. See details below.
`cluster`	optional, a name refering to a colum of `data`, or a numeric or factor vector with a cluster ID to be employed for clustered covariances in the parameter stability tests. Most useful if `method = "mob"`, for `method = "ctree"` probably less so as it may yield overly conservative splitting. This argument should be used when the partitioning variables are not measured on the individual observation level, but on a higher level. E.g., when the response variables consists of repeated measurements of the same respondents.
`offset`	numeric vector of length `nrow(data)`. Supplies model offset for use in fitting. Note that this offset will always be completely ignored when predicting.
`verbose`	logical. Should progress be printed to the commande line in every iteration? If true, the iteration number, information on the splitting procedure, and the log-likelihood (with df) value of the fitted full mixed-effects gam model is printed.
`parm`	vector of one or more integers, indicating which parameters should be included in the parameter stability tests. The default `c(1, 2, 4)` includes the intercept, linear slope and error variance of the smoothing spline. The 3rd parameter is the variance of smooth term. It is excluded by default, because its inclusion yields too high power in many situations.
`gam_ctrl`	a list of fit control parameters to replace defaults returned by `gam.control`.
`tree_ctrl`	a `list` of one or more control parameters as accepted by `mob_control` (to be passed to function `mob` if `method = "mob"`), or `ctree_control` (to be passed to function `ctree` is `method = "ctree"`). Note: arguments `xtype` and `ytype` of `mob_control` are set to `"data.frame"`, by default, this cannot be changed. Argument `parm` of `mob_control` will be overruled by the argument of the same name of the current function.
`alt_formula`	list with two elements, for specifying non-standard model formulae for GAM. E.g., the formula list required for use of the `multinom` family.
`...`	additional arguments to be passed to function `gamm4`.

Details

MOB is short for model-based recursive partitioning, ctree is short for conditional inference tree. MOB is based more strongly on parametric theory, thereby allowing for easy inclusion of clustering structures into the estimation procedure (see also argument cluster), yielding similar to a GEE-type approach for estimation of multilevel and longitudinal data structures. Yet, computation time for MOB is much larger than for ctree, which is mostly due to how it searches for the optimal splitting value, after the variable for splitting has been selected. ctree uses tests based on permutation theory, and thereby offers a less parametrically oriented approach. It is much faster than MOB, but does not provide a natural way of accounting for multilevel or longitudinal data structures.

Value

Returns an object of class "gamtree". This is a list, containing (amongst others) the GAM-based recursive partition (in $tree). The following methods are available to extract information from the fitted object: predict.gamtree, for obtaining predicted values for training and new observations; plot.gamtree for plotting the tree and variables' effects; coef.gamtree, fixef.gamtree and ranef.gamtree for extracting estimated coefficients. VarCorr.gamtree for extracting random-effects (co)variances, summary.gamtree for a summary of the fitted models.

Examples

gt_m <- gamtree(Pn ~ s(PAR, k = 5L) | Species, data = eco, cluster = Specimen)
summary(gt_m)
gt_c <- gamtree(Pn ~ s(PAR, k = 5L) | Species, data = eco, method = "ctree")
summary(gt_c)

marjoleinF/gamtree documentation built on June 10, 2025, 1:04 p.m.