gamtree: Recursively partition a dataset based on penalized GAMs.

View source: R/gamtree.R

gamtreeR Documentation

Recursively partition a dataset based on penalized GAMs.

Description

gamtree recursively partitions a dataset into subgroups with penalized GAMs, characterized by differences in the parameter estimates.

Usage

gamtree(
  formula,
  data,
  weights = NULL,
  REML = TRUE,
  method = "mob",
  cluster = NULL,
  offset = NULL,
  verbose = FALSE,
  parm = c(1, 2, 4),
  gam_ctrl = list(),
  tree_ctrl = list(),
  alt_formula = NULL,
  ...
)

Arguments

formula

specifies the model formula, consisting of three parts: the response variable followed by a tilde ('~'); the terms for the node-specific GAMs, followed by a vertical bar ('|') and the potential partitioning variables (separated by a '+'). The 'by' argument of function s may NOT be used in the node-specific GAM formulation. Refrain from using the dot ('.') to specify all remaining variables in data, this may yield unexpected results; make sure to specify each variable in the corresponding part of the model formula. See Examples.

data

data.frame containing the variables specified in formula.

weights

numeric vector of length nrow(data); optional case weights. A weight of 2, for example, is equivalent to having made exactly the same observation twice.

REML

logical, defaults to TRUE. Passed on to 'gamm4' and in turn 'lmer' (but not 'glmer') fitting routines to control whether REML or ML estimation is used.

method

character, one of "ctree" or "mob", indicates which partitioning algorithm should be used. See details below.

cluster

optional, a name refering to a colum of data, or a numeric or factor vector with a cluster ID to be employed for clustered covariances in the parameter stability tests. Most useful if method = "mob", for method = "ctree" probably less so as it may yield overly conservative splitting. This argument should be used when the partitioning variables are not measured on the individual observation level, but on a higher level. E.g., when the response variables consists of repeated measurements of the same respondents.

offset

numeric vector of length nrow(data). Supplies model offset for use in fitting. Note that this offset will always be completely ignored when predicting.

verbose

logical. Should progress be printed to the commande line in every iteration? If true, the iteration number, information on the splitting procedure, and the log-likelihood (with df) value of the fitted full mixed-effects gam model is printed.

parm

vector of one or more integers, indicating which parameters should be included in the parameter stability tests. The default c(1, 2, 4) includes the intercept, linear slope and error variance of the smoothing spline. The 3rd parameter is the variance of smooth term. It is excluded by default, because its inclusion yields too high power in many situations.

gam_ctrl

a list of fit control parameters to replace defaults returned by gam.control.

tree_ctrl

a list of one or more control parameters as accepted by mob_control (to be passed to function mob if method = "mob"), or ctree_control (to be passed to function ctree is method = "ctree"). Note: arguments xtype and ytype of mob_control are set to "data.frame", by default, this cannot be changed. Argument parm of mob_control will be overruled by the argument of the same name of the current function.

alt_formula

list with two elements, for specifying non-standard model formulae for GAM. E.g., the formula list required for use of the multinom family.

...

additional arguments to be passed to function gamm4.

Details

MOB is short for model-based recursive partitioning, ctree is short for conditional inference tree. MOB is based more strongly on parametric theory, thereby allowing for easy inclusion of clustering structures into the estimation procedure (see also argument cluster), yielding similar to a GEE-type approach for estimation of multilevel and longitudinal data structures. Yet, computation time for MOB is much larger than for ctree, which is mostly due to how it searches for the optimal splitting value, after the variable for splitting has been selected. ctree uses tests based on permutation theory, and thereby offers a less parametrically oriented approach. It is much faster than MOB, but does not provide a natural way of accounting for multilevel or longitudinal data structures.

Value

Returns an object of class "gamtree". This is a list, containing (amongst others) the GAM-based recursive partition (in $tree). The following methods are available to extract information from the fitted object: predict.gamtree, for obtaining predicted values for training and new observations; plot.gamtree for plotting the tree and variables' effects; coef.gamtree, fixef.gamtree and ranef.gamtree for extracting estimated coefficients. VarCorr.gamtree for extracting random-effects (co)variances, summary.gamtree for a summary of the fitted models.

See Also

predict.gamtree plot.gamtree coef.gamtree summary.gamtree

Examples

gt_m <- gamtree(Pn ~ s(PAR, k = 5L) | Species, data = eco, cluster = Specimen)
summary(gt_m)
gt_c <- gamtree(Pn ~ s(PAR, k = 5L) | Species, data = eco, method = "ctree")
summary(gt_c)


marjoleinF/gamtree documentation built on July 3, 2024, 9:18 a.m.