| SIS | R Documentation |
This function first implements the Iterative Sure Independence Screening for different variants of (I)SIS, and then fits the final regression model using the R packages ncvreg, glmnet, and msaenet plus an internal Cox adaptive elastic-net implementation for the SCAD/MCP/LASSO/ENET/AENET regularized loglikelihood for the variables picked by (I)SIS.
SIS(
x,
y,
family = c("gaussian", "binomial", "poisson", "cox", "multinom"),
penalty = c("SCAD", "MCP", "lasso", "enet", "aenet", "msaenet"),
concavity.parameter = switch(penalty, SCAD = 3.7, 3),
tune = c("bic", "ebic", "aic", "cv"),
nfolds = 10,
type.measure = c("deviance", "class", "auc", "mse", "mae"),
gamma.ebic = 1,
nsis = NULL,
iter = TRUE,
iter.max = ifelse(greedy == FALSE, 10, floor(nrow(x)/log(nrow(x)))),
varISIS = c("vanilla", "aggr", "cons"),
perm = FALSE,
q = 1,
greedy = FALSE,
greedy.size = 1,
seed = NULL,
standardize = TRUE,
covars = NULL,
boot_ci = FALSE,
parallel = TRUE
)
x |
The design matrix, of dimensions n * p, without an intercept. Each
row is an observation vector. |
y |
The response vector of dimension n * 1. Quantitative for
|
family |
Response type (see above). |
penalty |
The penalty to be applied in the regularized likelihood subproblems. 'SCAD', 'MCP', or 'lasso' are provided. 'lasso' is the default for family = 'multinom' or 'cox', 'SCAD' is the default for other families. |
concavity.parameter |
The tuning parameter used to adjust the concavity of the SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP. |
tune |
Method for tuning the regularization parameter of the penalized
likelihood subproblems and of the final model selected by (I)SIS. Options
include |
nfolds |
Number of folds used in cross-validation. The default is 10. |
type.measure |
Loss to use for cross-validation. Currently five
options, not all available for all models. The default is
|
gamma.ebic |
Specifies the parameter in the Extended BIC criterion
penalizing the size of the corresponding model space. The default is
|
nsis |
Number of pedictors recuited by (I)SIS. |
iter |
Specifies whether to perform iterative SIS. The default is
|
iter.max |
Maximum number of iterations for (I)SIS and its variants. |
varISIS |
Specifies whether to perform any of the two ISIS variants
based on randomly splitting the sample into two groups. The variant
|
perm |
Specifies whether to impose a data-driven threshold in the size
of the active sets calculated during the ISIS procedures. The threshold is
calculated by first decoupling the predictors |
q |
Quantile for calculating the data-driven threshold in the
permutation-based ISIS. The default is |
greedy |
Specifies whether to run the greedy modification of the
permutation-based ISIS. The default is |
greedy.size |
Maximum size of the active sets in the greedy
modification of the permutation-based ISIS. The default is
|
seed |
Random seed used for sample splitting, random permutation, and cross-validation sampling of training and test sets. |
standardize |
Logical flag for x variable standardization, prior to
performing (iterative) variable screening. The resulting coefficients are
always returned on the original scale. Default is |
covars |
Names of the factor variables. |
boot_ci |
Logical flag for computing bootstrap confidence intervals. Default = FALSE. |
parallel |
Specifies whether to conduct parallel computing |
A list with components:
The vector of indices selected by only SIS.
The vector of indices selected by (I)SIS with the regularization step.
The vector of coefficients of the final model selected by (I)SIS.
A fitted object of type ncvreg, cv.ncvreg,
glmnet, or cv.glmnet for the final model selected by the
(I)SIS procedure. If tune='cv', the returned fitted object is of
type cv.ncvreg if penalty='SCAD' or penalty='MCP';
otherwise, the returned fitted object is of type cv.glmnet. For
the remaining options of tune, the returned object is of type
glmnet if penalty='lasso', and ncvreg otherwise.
The index along the solution path of fit for
which the criterion specified in tune is minimized.
The vector of indices ordered by decreasing importance.
The list of vectors of indices ordered by decreasing importance, for each screening step.
A data frame with columns coef, CI_low,
CI_up, Est, CI_low_perc, and CI_up_perc.
Jianqing Fan, Yang Feng, Diego Franco Saldana, Richard Samworth, Arce Domingo-Relloso and Yichao Wu
Diego Franco Saldana and Yang Feng (2018) SIS: An R package for Sure Independence Screening in Ultrahigh Dimensional Statistical Models, Journal of Statistical Software, 83, 2, 1-25.
Jianqing Fan and Jinchi Lv (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space (with discussion). Journal of Royal Statistical Society B, 70, 849-911.
Jianqing Fan and Rui Song (2010) Sure Independence Screening in Generalized Linear Models with NP-Dimensionality. The Annals of Statistics, 38, 3567-3604.
Jianqing Fan, Richard Samworth, and Yichao Wu (2009) Ultrahigh Dimensional Feature Selection: Beyond the Linear Model. Journal of Machine Learning Research, 10, 2013-2038.
Jianqing Fan, Yang Feng, and Yichao Wu (2010) High-dimensional Variable Selection for Cox Proportional Hazards Model. IMS Collections, 6, 70-86.
Jianqing Fan, Yang Feng, and Rui Song (2011) Nonparametric Independence Screening in Sparse Ultrahigh Dimensional Additive Models. Journal of the American Statistical Association, 106, 544-557.
Jiahua Chen and Zehua Chen (2008) Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika, 95, 759-771.
Domingo-Relloso, Arce, Yang Feng, Zulema Rodriguez-Hernandez, Karin Haack, Shelley A. Cole, Ana Navas-Acien, Maria Tellez-Plaza, and Jose D. Bermudez (2024) Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multimarker in the Strong Heart Study. American Journal of Epidemiology, 193, no. 7: 1010-1018.
predict.SIS
set.seed(0)
n <- 400
p <- 50
rho <- 0.5
corrmat <- diag(rep(1 - rho, p)) + matrix(rho, p, p)
corrmat[, 4] <- sqrt(rho)
corrmat[4, ] <- sqrt(rho)
corrmat[4, 4] <- 1
corrmat[, 5] <- 0
corrmat[5, ] <- 0
corrmat[5, 5] <- 1
cholmat <- chol(corrmat)
x <- matrix(rnorm(n * p, mean = 0, sd = 1), n, p)
x <- x %*% cholmat
# gaussian response
set.seed(1)
b <- c(4, 4, 4, -6 * sqrt(2), 4 / 3)
y <- x[, 1:5] %*% b + rnorm(n)
# SIS without regularization
model10 <- SIS(x, y, family = "gaussian", iter = FALSE)
model10$sis.ix0
# The top 10 selected variables
model10$ix0[1:10]
# The top 10 selected variables for each step
lapply(model10$ix_list, f <- function(x) {
x[1:10]
})
# ISIS with regularization
model11 <- SIS(x, y, family = "gaussian", tune = "bic")
model12 <- SIS(x, y, family = "gaussian", tune = "bic", varISIS = "aggr", seed = 11)
model11$ix
model12$ix
## Not run:
# binary response
set.seed(2)
feta <- x[, 1:5] %*% b
fprob <- exp(feta) / (1 + exp(feta))
y <- rbinom(n, 1, fprob)
model21 <- SIS(x, y, family = "binomial", tune = "bic")
model22 <- SIS(x, y, family = "binomial", tune = "bic", varISIS = "aggr", seed = 21)
model21$ix
model22$ix
# poisson response
set.seed(3)
b <- c(0.6, 0.6, 0.6, -0.9 * sqrt(2))
myrates <- exp(x[, 1:4] %*% b)
y <- rpois(n, myrates)
model31 <- SIS(x, y,
family = "poisson", penalty = "lasso", tune = "bic", perm = TRUE, q = 0.9,
greedy = TRUE, seed = 31
)
model32 <- SIS(x, y,
family = "poisson", penalty = "lasso", tune = "bic", varISIS = "aggr",
perm = TRUE, q = 0.9, seed = 32
)
model31$ix
model32$ix
# Cox model
set.seed(4)
b <- c(4, 4, 4, -6 * sqrt(2), 4 / 3)
myrates <- exp(x[, 1:5] %*% b)
Sur <- rexp(n, myrates)
CT <- rexp(n, 0.1)
Z <- pmin(Sur, CT)
ind <- as.numeric(Sur <= CT)
y <- survival::Surv(Z, ind)
model41 <- SIS(x, y,
family = "cox", penalty = "lasso", tune = "bic",
varISIS = "aggr", seed = 41
)
model42 <- SIS(x, y,
family = "cox", penalty = "lasso", tune = "bic",
varISIS = "cons", seed = 41
)
model41$ix
model42$ix
# SIS with bootstrap confidence intervals
sis <- SIS(x, y, family = "cox", penalty='aenet', tune='cv', varISIS='cons',
seed = 41, boot_ci=FALSE)
sis$cis
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.