stima: Simultaneous Threshold Interaction Modeling Algorithm

Description Usage Arguments Value References See Also Examples

Description

This function fits a regression trunk model (default option) using the simultaneous threshold interaction modeling algorithm. The algorithm fits a regression tree and a multiple regression model simultaneously.

Usage

1
2
stima(data, maxsplit, model = "regtrunk", first = NULL, vfold = 10, 
CV = 1, Save = FALSE, control = NULL, printoutput = TRUE)

Arguments

data

a data frame with one continuous response variable and multiple predictors (categorical or continuous). IMPORTANT: The first column is treated as the response variable, the remaining columns as predictors.

maxsplit

the maximum number of splits.

model

the default model is a regression trunk model. The classification trunk model is under development.

first

the column number in the data frame of the predictor that is used for the first split of the regression trunk. The default option automatically selects the predictor for the first split.

vfold

the number of sets to be used in the cross-validation. The default value is 10, which means 10-fold cross-validation. If vfold = 0, no cross-validation is performed.

CV

the number of times the cross-validation procedure is performed. The default is once. If CV = 5 and vfold = 10, five times a tenfold cross-validation is performed.

Save

if Save = TRUE, the new data are saved and added to the output of the rt-object. The data include indicator variables of the terminal nodes (regions) of the regression trunk.

control

options controlling details of the algorithm. For default options see stima.control.

printoutput

if TRUE, output will be printed while running the function.

Value

an object of class rt, which is a list containing at least the following components

call

the matched call.

trunk

the fitted regression trunk. MeanResponse is the mean response value of the observations in that particular node (this is not the predicted response value).

splitsequence

the number of the nodes that are split.

goffull

goodness-of-fit estimates of the full regression trunk model estimated after 1 split through the model estimated after the maximum number of splits.

full

the estimated full regression trunk model after the maximum number of splits. Coefficient = estimated unstandardized regression coefficient; Std. Coef. = standardized regression coefficient.

References

Dusseldorp, E. & Meulman, J. J. (2004). The regression trunk approach to discover treatment covariate interaction. Psychometrika, 69, 355-374.

Dusseldorp, E. Conversano, C., and Van Os, B.J. (2010). Combining an additive and tree-based regression model simultaneously: STIMA. Journal of Computational and Graphical Statistics, 19(3), 514-530.

See Also

stima.control,summary.rt,prune.rt,plot.rt and help("stima-package")

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#Example with Boston Housing dataset from paper in JCGS
data(boston)
#grow a full regression trunk with automatic first split selection 
#and maximum number of splits = 10, with: bostonrt<-stima(boston,10)  
#NB. This analysis will take a long time (about one hour)
#inspect the output with: summary(bostonrt)
#prune the tree with: prune(bostonrt,data=boston)
#the pruned regression trunk has 7 splits
#to save time in the example, we select the splitting candidates beforehand,
#and we grow a tree with a maximum of 4 splits: 
contr<-stima.control(predtrunk=c(8,9,16)) 
bostonrt_pr<-stima(boston,4,first=16,vfold=0,Save=TRUE,control = contr) 
summary(bostonrt_pr)
#inspect the coefficients of the final regression trunk model
round(bostonrt_pr$full,digits=2)
#inspect the new data including the indicator variables referring 
#to the terminal nodes
bostonrt_pr$newdata

stima documentation built on June 3, 2019, 5:04 p.m.