Description Usage Arguments Details Value Author(s) References See Also Examples
The tvcglm
function implements the
treebased varying coefficient regression algorithm for generalized
linear models introduced by Buergin and Ritschard (2017). The
algorithm approximates varying coefficients by piecewise constant
functions using recursive partitioning, i.e., it estimates the
selected coefficients individually by strata of the value space of
partitioning variables. The special feature of the provided algorithm
is that it allows building for each varying coefficient an individual
partition, which enhances the possibilities for model specification
and to select partitioning variables individually by coefficient.
1 2 3 4 5 6 7 8 9  tvcglm(formula, data, family,
weights, subset, offset, na.action = na.omit,
control = tvcglm_control(), ...)
tvcglm_control(minsize = 30, mindev = 2.0,
maxnomsplit = 5, maxordsplit = 9, maxnumsplit = 9,
cv = TRUE, folds = folds_control("kfold", 5),
prune = cv, fast = TRUE, center = fast,
maxstep = 1e3, verbose = FALSE, ...)

formula 
a symbolic description of the model to fit, e.g.,
where the 
family 
the model family. An object of class

data 
a data frame containing the variables in the model. 
weights 
an optional numeric vector of weights to be used in the fitting process. 
subset 
an optional logical or integer vector specifying a
subset of 
offset 
this can be used to specify an a priori known component to be included in the linear predictor during fitting. 
na.action 
a function that indicates what should happen if data
contain 
control 
a list with control parameters as returned by

minsize 
numeric (vector). The minimum sum of weights in terminal nodes. 
mindev 
numeric scalar. The minimum permitted training error reduction a split must exhibit to be considered of a new split. The main role of this parameter is to save computing time by early stopping. May be set lower for very few partitioning variables resp. higher for many partitioning variables. 
maxnomsplit, maxordsplit, maxnumsplit 
integer scalars for split
candidate reduction. See 
cv 
logical scalar. Whether or not the 
folds 
a list of parameters to create folds as produced by

prune 
logical scalar. Whether or not the initial tree should be
pruned by the estimated 
fast 
logical scalar. Whether the approximative model should be
used to search for the next split. The approximative search model
uses only the observations of the node to split and incorporates the
fitted values of the current model as offsets. Therewith the
estimation is reduces to the coefficients of the added split. If

center 
logical integer. Whether the predictor variables of
update models during the grid search should be centered. Note that

maxstep 
integer. The maximum number of iterations i.e. number of splits to be processed. 
verbose 
logical. Should information about the fitting process be printed to the screen? 
... 
additional arguments passed to the fitting function

tvcglm
processes two stages. The first stage, called
partitioning stage, builds overly fine partitions for each vc
term; the second stage, called pruning stage, selects the bestsized
partitions by collapsing inner nodes. For details on the pruning
stage, see tvcmassessment
. The partitioning stage
iterates the following steps:
Fit the current generalized linear model
y ~ NodeA:x1 + ... + NodeK:xK
with glm
, where Nodek
is a categorical
variable with terminal node labels for the kth varying
coefficient.
Search the globally best split among the candidate splits by an exhaustive 2 likelihood training error search that cycles through all possible splits.
If the 2 likelihood training error reduction of the best
split is smaller than mindev
or there is no candidate split
satisfying the minimum node size minsize
, stop the
algorithm.
Else incorporate the best split and repeat the procedure.
The partitioning stage selects, in each iteration, the split that
maximizes the 2 likelihood training error reduction, compared to the
current model. The default stopping parameters are minsize = 30
(a minimum node size of 30) and mindev = 2
(the training error
reduction of the best split must be larger than two to continue).
The algorithm implements a number of split point reduction methods to
decrease the computational complexity. See the arguments
maxnomsplit
, maxordsplit
and maxnumsplit
.
The algorithm can be seen as an extension of CART (Breiman et. al., 1984) and PartReg (Wang and Hastie, 2014), with the new feature that partitioning can be processed coefficientwise.
An object of class tvcm
Reto Buergin
Breiman, L., J. H. Friedman, R. A. Olshen and C.J. Stone (1984). Classification and Regression Trees. New York, USA: Wadsworth.
Wang, J. C., Hastie, T. (2014), Boosted VaryingCoefficient Regression Models for Product Demand Prediction, Journal of Computational and Graphical Statistics, 23(2), 361382.
Buergin, R. and G. Ritschard (2017), CoefficientWise TreeBased Varying Coefficient Regression with vcrpart. Journal of Statistical Software, 80(6), 1–33.
tvcm_control
, tvcmmethods
,
tvcmplot
, tvcmplot
,
tvcmassessment
, fvcglm
,
glm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  ##  #
## Example: Moderated effect of education on poverty
##
## The algorithm is used to find out whether the effect of high
## education 'EduHigh' on poverty 'Poor' is moderated by the civil
## status 'CivStat'. We specify two 'vc' terms in the logistic
## regression model for 'Poor': a first that accounts for the direct
## effect of 'CivStat' and a second that accounts for the moderation of
## 'CivStat' on the relation between 'EduHigh' and 'Poor'. We use here
## the 2stage procedure with a partitioning and a pruning stage as
## described in Buergin and Ritschard (2017).
##  #
data(poverty)
poverty$EduHigh < 1 * (poverty$Edu == "high")
## fit the model
model.Pov <
tvcglm(Poor ~ 1 + vc(CivStat) + vc(CivStat, by = EduHigh) + NChild,
family = binomial(), data = poverty, subset = 1:200,
control = tvcm_control(verbose = TRUE, papply = lapply,
folds = folds_control(K = 1, type = "subsampling", seed = 7)))
## diagnosis
plot(model.Pov, "cv")
plot(model.Pov, "coef")
summary(model.Pov)
splitpath(model.Pov, steps = 1:3)
prunepath(model.Pov, steps = 1)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.