predBMA: Predictive accuracy measurement using Bayesian Model...

Description Usage Arguments Value Author(s) References Examples

View source: R/predBMA.R

Description

This function is used for predictive accuracy measurement of the selected models using Bayesian Model Averaging methodology. The Occam's window with cut out threshold of thr is used. That means only models that have posterior probability of at least thr * posteior probability of the highest posterior probability model are considered in model averaging. For survival time response datasets, the predictive Area Under Curve (AUC) at each given time point is computed as the output. In this case, the predictive AUC is obtained using Uno's method for observations in the test set. For binary outcome data, only one AUC is reported which is from the ROC computed on the test set. The training set is used to find the selected model and relevant probabilities.

Usage

1
2
predBMA(bvsobj, X, resp, nlptype = "piMOM", train_idx, test_idx, thr = 0.05,
  times = NULL, family = c("logistic", "survival"))

Arguments

bvsobj

An object that is generated by bvs function. It is the output of the Bayesian variable selection procedure.

X

The n times p design matrix. It should be in the same scale as the input to bvs function. In particular, if preprocessing has been done via bvs function for the design matrix, this input should be the output of PreProcess function. Also note that For binary data, X should NOT contain vector of 1's.

resp

For logistic regression models, this variable is the binary response vector. For the Cox proportional hazard models this is a two column matrix where the first column contains survival times and the second column is the censoring status for each observation. Note that for survival times, the time section of this variable should be in the same scale and unit (year, days, etc.) as times variable for which the AUC has to be computed.

nlptype

Determines the type of nonlocal prior that is used in the analyses. It can be "piMOM" for product inverse moment prior, or "pMOM" for product moment prior. The default is set to piMOM prior.

train_idx

An integer vector containing the indices of the training set.

test_idx

An integer vector containing the indices of the test set. The set of observations that prediction will be performed on.

thr

The threshold used for Occam's window as explained in the description. The default value for this variable is 0.05.

times

A vector of times at which predictive AUC is to be computed. This input is only used for prediction in survival data analysis.

family

Determines the type of data analysis. logistic is for binary outcome and logistic regression model whereas, survival represents survival outcomes and the Cox proportional hazard model.

Value

The output is different based on the family for the anlysis of data 1) family = logistic The output is a list with the two following objects:

auc

This is the area under the ROC curve after Bayesian model averaging is used to obtain ROC for the test data.

roc_curve

This is a two column matrix representing points on the ROC curve and can be used to plot the curve. The first column is FPR and the second column is TPR which represent x-axis and y-axis in the ROC curve, respectively.

2) family = survival

auc

A vector with the same length as times variable showing predictive area under the curve at each given time point using Bayesian Model averaging.

Author(s)

Amir Nikooienejad

References

Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92(437), 179-191.

Nikooienejad, A., Wang, W., and Johnson, V. E. (2017). Bayesian Variable Selection in High Dimensional Survival Time Cancer Genomic Datasets using Nonlocal Priors. arXiv preprint, arXiv:1712.02964.

Uno, H., Cai, T., Tian, L., & Wei, L. J. (2007). Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association, 102(478), 527-537.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
### Simulating Logistic Regression Data
n <- 200
p <- 40
set.seed(123)
Sigma <- diag(p)
full <- matrix(c(rep(0.5, p*p)), ncol=p)
Sigma <- full + 0.5*Sigma
cholS <- chol(Sigma)
Beta <- c(-1.7,1.8,2.5)
X <- matrix(rnorm(n*p), ncol=p)
X <- X%*%cholS
colnames(X) <- c(paste("gene_",c(1:p),sep=""))
beta <- numeric(p)
beta[c(1:length(Beta))] <- Beta
Xout <- PreProcess(X)
X <- Xout$X
XB <- X%*%beta
probs <- as.vector(exp(XB)/(1+exp(XB)))
y <- rbinom(n,1,probs)
train_idx <- sample(1:n,0.8*n)
test_idx <- setdiff(1:n,train_idx)
X_train <- X[train_idx,]
y_train <- y[train_idx]
bout <- bvs(X_train, y_train, prep=FALSE, family = "logistic",
           mod_prior = "beta",niter = 50)
BMAout <- predBMA(bout, X, y, train_idx = train_idx, test_idx = test_idx,
                 family="logistic")
### AUC for the prediction:
BMAout$auc

### Plotting ROC Curve
roc <- BMAout$roc_curve
plot(roc,lwd=2,type='l',col='blue')

BVSNLP documentation built on May 17, 2018, 9:05 a.m.