p.val.tree: Compute the p-value

View source: R/p.val.tree.R

p.val.treeR Documentation

Compute the p-value

Description

Test weither the selected tree by either BIC, AIC or CV procedure is significantly associated to the dependent variable or not, while adjusting for a confounding effect.

Usage

p.val.tree(xtree, xdata, Y.name, X.names, G.names, B = 10, args.rpart = 
list(minbucket = 40, maxdepth = 10, cp = 0), epsi = 0.001, iterMax = 5,
iterMin = 3, family = "binomial", LB = FALSE, 
args.parallel = list(numWorkers = 1), index = 4, verbose = TRUE)

Arguments

xtree

the maximal tree obtained by the function pltr.glm

xdata

the data frame used to build xtree

Y.name

the name of the dependent variable

X.names

the names of independent confounding variables to consider in the linear part of the glm

G.names

the names of independent variables to consider in the tree part of the hybrid glm.

B

the resampling size of the deviance difference

args.rpart

a list of options that control details of the rpart algorithm. minbucket: the minimum number of observations in any terminal <leaf> node; cp: complexity parameter (Any split that does not decrease the overall lack of fit by a factor of cp is not attempted); maxdepth: the maximum depth of any node of the final tree, with the root node counted as depth 0. ... See rpart.control for further details

epsi

a treshold value to check the convergence of the algorithm

iterMax

the maximal number of iteration to consider

iterMin

the minimum number of iteration to consider

family

the glm family considered depending on the type of the dependent variable.

LB

a binary indicator with values TRUE or FALSE indicating weither the loading are balanced or not in the parallel computing

args.parallel

parameters of the parallelization. See mclapply for more details.

index

the size of the selected tree (by the functions best.tree.BIC.AIC or best.tree.CV) using one of the proposed criteria

verbose

Logical; TRUE for printing progress during the computation (helpful for debugging)

Value

A list of three elements:

p.value

The P-value of the selected tree

Timediff

The execution time of the test procedure

Badj

The number of samples used inside the the procedure

Author(s)

Cyprien Mbogning

References

Mbogning, C., Perdry, H., Toussile, W., Broet, P.: A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities. Journal of Clinical Bioinformatics 4:6, (2014)

Fan, J., Zhang, C., Zhang, J.: Generalized likelihood ratio statistics and WILKS phenomenon. Annals of Statistics 29(1), 153-193 (2001)

See Also

best.tree.bootstrap, best.tree.permute

Examples

## Not run: 
## load the data set

data(data_pltr)

## set the parameters 

args.rpart <- list(minbucket = 40, maxdepth = 10, cp = 0)
family <- "binomial"
Y.name <- "Y"
X.names <- "G1"
G.names <- paste("G", 2:15, sep="")

## build a maximal tree

fit_pltr <- pltr.glm(data_pltr, Y.name, X.names, G.names, args.rpart = args.rpart, 
                    family = family,iterMax = 5, iterMin = 3)
                     
##prunned back the maximal tree by BIC or AIC criterion

tree_select <- best.tree.BIC.AIC(xtree = fit_pltr$tree,data_pltr,Y.name, 
                                 X.names, family = family)
                     
## Compute the p-value of the selected tree by BIC

args.parallel = list(numWorkers = 10, type = "PSOCK")
index = tree_select$best_index[[1]]
p_value <- p.val.tree(xtree = fit_pltr$tree, data_pltr, Y.name, X.names, G.names,
            B = 100, args.rpart = args.rpart, epsi = 1e-3, 
            iterMax = 5, iterMin = 3, family = family, LB = FALSE, 
            args.parallel = args.parallel, index = index)

## End(Not run)

GPLTR documentation built on Aug. 27, 2023, 1:06 a.m.