R/predBMA.R
In BVSNLP: Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors

Documented in predBMA

#' Predictive accuracy measurement using Bayesian Model Averaging
#'
#' @description This function is used for predictive accuracy measurement of
#' the selected models using Bayesian Model Averaging methodology. The Occam's
#' window with cut out threshold of \code{thr} is used. That means only models
#' that have posterior probability of at least \code{thr} * posteior
#' probability of the highest posterior probability model are considered in
#' model averaging. For survival time response datasets, the predictive Area
#' Under Curve (AUC) at each given time point is computed as the output. In
#' this case, the predictive AUC is obtained using Uno's method for
#' observations in the test set. For binary outcome data, only one AUC is
#' reported which is from the ROC computed on the test set. The training set is
#' used to find the selected model and relevant probabilities.
#' 
#' @param bvsobj An object that is generated by \code{\link{bvs}} function. It
#' is the output of the Bayesian variable selection procedure.
#' @param X The same \code{n} times \code{p} data frame initially used in
#' \code{bvs} function. This is the full data frame containing all test and
#' train data.
#' @param prep A boolean variable determining if the preprocessing step has
#' been done on the original data frame in \code{bvs} function. This should
#' have the same value as \code{prep} variable in the \code{bvs} function.
#' @param logT A boolean variable determining if log transform should has been
#' done on continuous columns of the data frame in \code{bvs} function. This
#' should have the same value as \code{logT} variable in the \code{bvs}
#' function.
#' @param resp For logistic regression models, this variable is the binary
#' response vector. For the Cox proportional hazard models this is a two column
#' matrix where the first column contains survival times and the second column
#' is the censoring status for each observation. Note that for survival times,
#' the time section of this variable should be in the same scale and unit
#' (year, days, etc.) as \code{times} variable for which the AUC has to be
#' computed.
#' @param nlptype Determines the type of nonlocal prior that is used in the
#' analyses. It can be "piMOM" for product inverse moment prior, or "pMOM" for
#' product moment prior. The default is set to piMOM prior.
#' @param train_idx An integer vector containing the indices of the training
#' set.
#' @param test_idx An integer vector containing the indices of the test set.
#' The set of observations that prediction will be performed on.
#' @param thr The threshold used for Occam's window as explained in the
#' description. The default value for this variable is 0.05.
#' @param times A vector of times at which predictive AUC is to be computed.
#' This input is only used for prediction in survival data analysis.
#' @param family Determines the type of data analysis. \code{logistic} is for
#' binary outcome and logistic regression model whereas, \code{survival}
#' represents survival outcomes and the Cox proportional hazard model.
#' @return The output is different based on the family for the anlysis of data
#' \strong{1) } \code{family = logistic}
#' The output is a list with the two following objects:
#' \item{auc}{This is the area under the ROC curve after Bayesian model
#' averaging is used to obtain ROC for the test data.}
#' \item{roc_curve}{This is a two column matrix representing points on the ROC
#' curve and can be used to plot the curve. The first column is FPR and the
#' second column is TPR which represent x-axis and y-axis in the ROC curve,
#' respectively.}
#' \strong{2) } \code{family = survival}
#' \item{auc}{A vector with the same length as \code{times} variable showing
#' predictive area under the curve at each given time point using Bayesian
#' Model averaging.}
#' @author Amir Nikooienejad
#' @references Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian
#' model averaging for linear regression models. Journal of the American
#' Statistical Association, 92(437), 179-191.\cr\cr
#' Nikooienejad, A., Wang, W., & Johnson, V. E. (2020). Bayesian variable
#' selection for survival data using inverse moment priors. Annals of Applied
#' Statistics, 14(2), 809-828. \cr\cr
#' Uno, H., Cai, T., Tian, L., & Wei, L. J. (2007). Evaluating
#' prediction rules for t-year survivors with censored regression models.
#' Journal of the American Statistical Association, 102(478), 527-537.
#' @examples
#' ### Simulating Logistic Regression Data
#'n <- 200
#'p <- 40
#'set.seed(123)
#'Sigma <- diag(p)
#'full <- matrix(c(rep(0.5, p*p)), ncol=p)
#'Sigma <- full + 0.5*Sigma
#'cholS <- chol(Sigma)
#'Beta <- c(-1.7,1.8,2.5)
#'X <- matrix(rnorm(n*p), ncol=p)
#'X <- X%*%cholS
#'colnames(X) <- c(paste("gene_",c(1:p),sep=""))
#'beta <- numeric(p)
#'beta[c(1:length(Beta))] <- Beta
#'Xout <- PreProcess(X)
#'X <- Xout$X
#'XB <- X%*%beta
#'probs <- as.vector(exp(XB)/(1+exp(XB)))
#'y <- rbinom(n,1,probs)
#'X <- as.data.frame(X)
#'train_idx <- sample(1:n,0.8*n)
#'test_idx <- setdiff(1:n,train_idx)
#'X_train <- X[train_idx,]
#'y_train <- y[train_idx]
#'bout <- bvs(X_train, y_train, prep=FALSE, family = "logistic",
#'            mod_prior = "beta",niter = 50)
#'BMAout <- predBMA(bout, X, y, prep = FALSE, logT = FALSE, 
#'                  train_idx = train_idx, test_idx = test_idx,
#'                  family="logistic")
#' ### AUC for the prediction:
#'BMAout$auc
#'
#'### Plotting ROC Curve
#'roc <- BMAout$roc_curve
#'plot(roc,lwd=2,type='l',col='blue')
predBMA <- function(bvsobj, X, resp, prep, logT, nlptype = "piMOM", train_idx,
                    test_idx, thr = 0.05, times = NULL,
                    family = c("logistic", "survival")){
  
  if(!class(X)=="data.frame") stop("input X should be a data frame!") 
  X <- predmat(X)

  X_tr <- bvsobj$des_mat
  gn <- bvsobj$gene_names
  xname <- colnames(X)
  ind <- match(gn,xname)
  X <- X[,ind]
  X_te <- X[test_idx,]
  
  if(prep){
  Xout <- PreProcess(X_te, logT)
  X_te <- Xout$X
  }

  r <- bvsobj$r
  tau <- bvsobj$tau
  probs <- bvsobj$max_prob_vec
  models <- bvsobj$max_models
  maxprob <- bvsobj$max_prob
  thresh <- maxprob + log(thr)
  passinds <- which(probs >= thresh)
  
  k <- length(passinds)
  oprobs <- probs[passinds]; sh <- ceiling(max(oprobs)); oprobs <- exp(oprobs - sh);
  oprobs <- oprobs/sum(oprobs)
  omodels <- models[passinds]
  if(nlptype=="piMOM") nlptype_int <- 0
  if(nlptype=="pMOM") nlptype_int <- 1  
  # =======================================
  
  if(family=="logistic"){
    y <- resp
    y_tr <- y[train_idx]
    y_te <- y[test_idx]
    X_tr <- cbind(rep(1,length(y_tr)),X_tr); X_te <- cbind(rep(1,length(y_te)),X_te);
    
    aucout <- aucBMA_logistic(X_tr,y_tr,X_te,y_te,tau,r,nlptype_int,oprobs,omodels,k)
    roc <- aucout$roc; colnames(roc) <- c("TPR","FPR")
    return(list(auc = aucout$auc, roc_curve = roc))
  }
  
  # ====================
  
  if(family=="survival"){
    TS <- resp;
    TS_tr <- TS[train_idx,]
    TS_te <- TS[test_idx,]
    if (!length(times)) stop("No times vector is set!")
    aucout <- aucBMA_survival(X_tr,TS_tr,X_te,TS_te,tau,r,nlptype_int,times,oprobs,omodels,k)
    return(aucout)
  }
}

Any scripts or data that you put into this service are public.

BVSNLP documentation built on Aug. 28, 2020, 5:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BVSNLP
Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors

R/predBMA.R
In BVSNLP: Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors

Defines functions predBMA

Documented in predBMA

Try the BVSNLP package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BVSNLP Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors

R/predBMA.R In BVSNLP: Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors

Defines functions predBMA

Documented in predBMA

Try the BVSNLP package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BVSNLP
Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors

R/predBMA.R
In BVSNLP: Bayesian Variable Selection in High Dimensional Settings using Nonlocal Priors