pltr.glm: Partially tree-based regression model function

View source: R/pltr.glm.R

pltr.glmR Documentation

Partially tree-based regression model function

Description

The pltr.glm function is designed to fit an hybrid glm model with an additive tree part on a glm scale.

Usage

pltr.glm(data, Y.name, X.names, G.names, family = "binomial", 
    args.rpart = list(cp = 0, minbucket = 20, maxdepth = 10), 
    epsi = 0.001, iterMax = 5, iterMin = 3, verbose = TRUE)

Arguments

data

a data frame containing the variables in the model

Y.name

the name of the dependent variable

X.names

the names of independent variables to consider in the linear part of the glm

G.names

the names of independent variables to consider in the tree part of the hybrid glm.

family

the glm family considered depending on the type of the dependent variable.

args.rpart

a list of options that control details of the rpart algorithm. minbucket: the minimum number of observations in any terminal <leaf> node; cp: complexity parameter (Any split that does not decrease the overall lack of fit by a factor of cp is not attempted); maxdepth: the maximum depth of any node of the final tree, with the root node counted as depth 0. ... See rpart.control for further details

epsi

a treshold value to check the convergence of the algorithm

iterMax

the maximal number of iteration to consider

iterMin

the minimum number of iteration to consider

verbose

Logical; TRUE for printing progress during the computation (helpful for debugging)

Details

The pltr.glm function use an itterative procedure to fit the linear part of the glm and the tree part. The tree obtained at the convergence of the procedure is a maximal tree which overfits the data. It's then mandatory to prunned back this tree by using one of the proposed criteria (BIC, AIC and CV).

Value

A list with four elements:

fit

the glm fitted on the confounding factors at the end of the iterative algorithm

tree

the maximal tree obtained at the end of the algorithm

nber_iter

the number of iterations used by the algorithm

Timediff

The execution time of the iterative procedure

Note

The tree obtained at the end of these itterative procedure usually overfits the data. It's therefore mendatory to use either best.tree.BIC.AIC or best.tree.CV to prunne back the tree.

Author(s)

Cyprien Mbogning and Wilson Toussile

References

Mbogning, C., Perdry, H., Toussile, W., Broet, P.: A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities. Journal of Clinical Bioinformatics 4:6, (2014)

Terry M. Therneau, Elizabeth J. Atkinson (2013) An Introduction to Recursive Partitioning Using the RPART Routines. Mayo Foundation.

Chen, J., Yu, K., Hsing, A., Therneau, T.M.: A partially linear tree-based regression model for assessing complex joint gene-gene and gene-environment effects. Genetic Epidemiology 31, 238-251 (2007)

See Also

rpart

Examples

data(burn)

args.rpart <- list(minbucket = 10, maxdepth = 4, cp = 0, maxcompete = 0, 
                    maxsurrogate = 0)
 family <- "binomial"
 X.names = "Z2"
 Y.name = "D2"
 G.names = c('Z1','Z3','Z4','Z5','Z6','Z7','Z8','Z9','Z10','Z11')
 
pltr.burn <- pltr.glm(burn, Y.name, X.names, G.names, args.rpart = args.rpart,
                   family = family, iterMax = 4, iterMin = 3, verbose = FALSE)


## Not run: 
## load the data set

data(data_pltr)

## set the parameters 

args.rpart <- list(minbucket = 40, maxdepth = 10, cp = 0)
family <- "binomial"
Y.name <- "Y"
X.names <- "G1"
G.names <- paste("G", 2:15, sep="")

## build a maximal tree 

fit_pltr <- pltr.glm(data_pltr, Y.name, X.names, G.names, args.rpart = args.rpart, 
                    family = family,iterMax = 5, iterMin = 3)
                    
plot(fit_pltr$tree, main = 'MAXIMAL TREE')
text(fit_pltr$tree, minlength = 0L, xpd = TRUE, cex = .6)

## End(Not run)

GPLTR documentation built on Aug. 27, 2023, 1:06 a.m.