R/BcorSIS.R

Defines functions BcorSIS

Documented in BcorSIS

#' A Generic Sure Independence Screening Procedure
#'
#' A generic nonparametric sure independence screening procedure,
#' called BCor-SIS, on the basis of a recently developed universal
#' dependence measure: Ball correlation.
#' We show that the proposed procedure has strong screening consistency even
#' when the dimensionality is an exponential order of the sample size
#' without imposing sub-exponential moment assumptions on the data.
#'
#' @param X The design matrix of dimensions n * p. Each row is an observation vector.
#' @param Y The response vector of dimension n * 1. For survival models, Y should
#' be an object of class Surv, as provided by the function
#' Surv() in the package survival.
#' @param nsis Number of predictors recruited by BcorSIS. The default is n/log(n).
#'
#' @return the labels of first nsis largest active set of all predictors
#'
#' @importFrom Ball bcorsis
#' @importFrom survival Surv
#' @export
#' @author Xuewei Cheng \email{xwcheng@hunnu.edu.cn}
#' @examples
#'
#' ## Scenario 1  generate complete data
#' n <- 100
#' p <- 200
#' rho <- 0.5
#' data <- GendataLM(n, p, rho, error = "gaussian")
#' data <- cbind(data[[1]], data[[2]])
#' colnames(data)[1:ncol(data)] <- c(paste0("X", 1:(ncol(data) - 1)), "Y")
#' data <- as.matrix(data)
#' X <- data[, 1:(ncol(data) - 1)]
#' Y <- data[, ncol(data)]
#' A1 <- BcorSIS(X, Y, n / log(n))
#' A1
#'
#' ## Scenario 2  generate survival data
#' library(survival)
#' n <- 100
#' p <- 200
#' rho <- 0.5
#' data <- GendataCox(n, p, rho)
#' data <- cbind(data[[1]], data[[2]], data[[3]])
#' colnames(data)[ncol(data)] <- c("status")
#' colnames(data)[(ncol(data) - 1)] <- c("time")
#' colnames(data)[(1:(ncol(data) - 2))] <- c(paste0("X", 1:(ncol(data) - 2)))
#' data <- as.matrix(data)
#' X <- data[, 1:(ncol(data) - 2)]
#' Y <- Surv(data[, (ncol(data) - 1)], data[, ncol(data)])
#' A2 <- BcorSIS(X, Y, n / log(n))
#' A2
#' @references
#'
#' Pan, W., X. Wang, H. Zhang, H. Zhu, and J. Zhu (2020). Ball covariance: A generic measure of dependence in banach space. Journal of the American Statistical Association 115(529),307–317.
#'
#' Pan, W., X. Wang, W. Xiao, and H. Zhu (2019). A generic sure independence screening procedure. Journal of the American Statistical Association 114(526), 928–937.
BcorSIS <- function(X, Y, nsis = (dim(X)[1]) / log(dim(X)[1])) {
  if (dim(X)[1] != length(Y)) {
    stop("X and Y should have same number of rows!")
  }
  if (missing(X) | missing(Y)) {
    stop("The data is missing!")
  }
  if (TRUE %in% (is.na(X) | is.na(Y) | is.na(nsis))) {
    stop("The input vector or matrix cannot have NA!")
  }
  p <- dim(X)[2] ## dimension
  if (class(Y) == c("Surv")) {
    model <- bcorsis(x = X, y = Y, method = "survival", d = p)
    A <- model$ix
  } else {
    model <- bcorsis(x = X, y = Y, method = "standard", d = p)
    A <- model$ix
  }
  return(A[1:nsis])
}

Try the MFSIS package in your browser

Any scripts or data that you put into this service are public.

MFSIS documentation built on June 22, 2024, 9:42 a.m.