Testing the additional predictive value of high-dimensional data

Share:

Description

The function globalboosttest implements a permutation-based testing procedure to globally test the (additional) predictive value of a large set of predictors given that a small set of predictors is already available.

Usage

1
globalboosttest(X,Y,Z=NULL,nperm=1000,mstop=1000,mstopAIC=FALSE,pvalueonly=TRUE,plot=FALSE,...)

Arguments

X

A n x p matrix or data frame with observations in rows and variables in columns, whose additional predictive value has to be tested.

Y

Either a n-vector of type factor (if the prediction outcome is binary), or a numeric vector of length n (if the prediction outcome is numeric and uncensored), or a Surv object (if the prediction outcome is a survival time).

Z

A n x q matrix or data frame with observations in rows and variables in columns, on which we want to condition. Note that q should be smaller than n. If Z=NULL, the function globalboosttest simply assesses the predictive value of X without conditioning.

nperm

The number of permutations used to derived the p-value.

mstop

A numeric vector giving the number(s) of boosting steps at which the p-value has to be calculated.

mstopAIC

If TRUE, the best number of boosting steps is determined based on AIC using the non-permuted data from the range 1:max(mstop).

pvalueonly

Should the function return only the permutation p-value or also the risk for all numbers of boosting steps and all permutations?

plot

If TRUE, a plot representing the minimized criterion for real data (in black) and permuted data (in grey).

...

Further arguments to be passed to the plot function if plot=TRUE.

Details

See Boulesteix and Hothorn (2009) for details on the methodology. If mstopAIC=TRUE, the number of boosting steps is chosen from 1 to max(mstop) independently of the specific values included in the vector mstop.

Value

A list with the following arguments

riskreal

A numeric vector of length max(mstop) giving the risk computed from the original data set with mstop from 1 to max(mstop) (if pvalueonly=FALSE).

riskperm

A npermxmax(mstop) matrix giving the risk computed from the nperm permuted data sets with mstop from 1 to max(mstop) (if pvalueonly=FALSE).

mstopAIC

The number of boosting steps selected using the AIC-based procedure (if mstopAIC=TRUE).

pvalue

A numeric vector of length length(mstop) (if mstopAIC=FALSE) or length(mstop)+1 (if mstopAIC=TRUE) giving the permutation-pvalues obtained for each considered value of mstop

Author(s)

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/eng.html),

Torsten Hothorn (http://www.statistik.lmu.de/~hothorn/)

References

A. L. Boulesteix and Torsten Hothorn (2010). Testing the additional predictive value of high-dimensional data. BMC Bioinformatics 10:78.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# load globalboosttest library
library(globalboosttest)

# load the simulated data with binary outcome
data(simdatabin)
attach(simdatabin)
# Test with 25 permutations
test<-globalboosttest(X=X,Y=Y,Z=Z,nperm=25,mstop=c(100,500,1000))


# load the simulated data with survival outcome
data(simdatasurv)
attach(simdatasurv)
# Test with 25 permutations
test<-globalboosttest(X=X,Y=Surv(time,status),Z=NULL,nperm=25,mstop=c(100,500,1000),mstopAIC=FALSE)