(Iterative) Sure Independence Screening ((I)SIS) and Fitting in Generalized Linear Models and Cox's Proportional Hazards Models
Description
This function first implements the Iterative Sure Independence Screening for different variants of (I)SIS, and then fits the final regression model using the R packages ncvreg and glmnet for the SCAD/MCP/LASSO regularized loglikelihood for the variables picked by (I)SIS.
Usage
1 2 3 4 5 6 7 8  SIS(x, y, family = c("gaussian", "binomial", "poisson", "cox"),
penalty = c("SCAD", "MCP", "lasso"), concavity.parameter = switch(penalty,
SCAD = 3.7, 3), tune = c("bic", "ebic", "aic", "cv"), nfolds = 10,
type.measure = c("deviance", "class", "auc", "mse", "mae"),
gamma.ebic = 1, nsis = NULL, iter = TRUE, iter.max = ifelse(greedy ==
FALSE, 10, floor(nrow(x)/log(nrow(x)))), varISIS = c("vanilla", "aggr",
"cons"), perm = FALSE, q = 1, greedy = FALSE, greedy.size = 1,
seed = 0, standardize = TRUE)

Arguments
x 
The design matrix, of dimensions n * p, without an intercept. Each
row is an observation vector. 
y 
The response vector of dimension n * 1. Quantitative for

family 
Response type (see above). 
penalty 
The penalty to be applied in the regularized likelihood subproblems. 'SCAD' (the default), 'MCP', or 'lasso' are provided. 
concavity.parameter 
The tuning parameter used to adjust the concavity of the SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP. 
tune 
Method for tuning the regularization parameter of the penalized
likelihood subproblems and of the final model selected by (I)SIS. Options
include 
nfolds 
Number of folds used in crossvalidation. The default is 10. 
type.measure 
Loss to use for crossvalidation. Currently five
options, not all available for all models. The default is

gamma.ebic 
Specifies the parameter in the Extended BIC criterion
penalizing the size of the corresponding model space. The default is

nsis 
Number of pedictors recuited by (I)SIS. 
iter 
Specifies whether to perform iterative SIS. The default is

iter.max 
Maximum number of iterations for (I)SIS and its variants. 
varISIS 
Specifies whether to perform any of the two ISIS variants
based on randomly splitting the sample into two groups. The variant

perm 
Specifies whether to impose a datadriven threshold in the size
of the active sets calculated during the ISIS procedures. The threshold is
calculated by first decoupling the predictors x_i and response
y_i through a random permutation π of (1,...,n) to form
a null model. For this newly permuted data, marginal regression
coefficients for each predictor are recalculated. As the marginal
regression coeffcients of the original data should be larger than most
recalculated coefficients in the null model, the datadriven threshold is
given by the qth quantile of the null coefficients. This datadriven
threshold only allows a 1q proportion of inactive variables to enter
the model when x_i and y_i are not related (in the null model).
The default is here is 
q 
Quantile for calculating the datadriven threshold in the
permutationbased ISIS. The default is 
greedy 
Specifies whether to run the greedy modification of the
permutationbased ISIS. The default is 
greedy.size 
Maximum size of the active sets in the greedy
modification of the permutationbased ISIS. The default is

seed 
Random seed used for sample splitting, random permutation, and cross validation sampling of training and test sets. 
standardize 
Logical flag for x variable standardization, prior to
performing (iterative) variable screening. The resulting coefficients are
always returned on the original scale. Default is 
Value
Returns an object with
ix 
The vector of indices selected by (I)SIS. 
coef.est 
The vector of coefficients of the final model selected by (I)SIS. 
fit 
A fitted object of type 
path.index 
The index along the solution path of

Author(s)
Jianqing Fan, Yang Feng, Diego Franco Saldana, Richard Samworth, and Yichao Wu
References
Diego Franco Saldana and Yang Feng (2016) SIS: An R package for Sure Independence Screening in Ultrahigh Dimensional Statistical Models, Journal of Statistical Software, to appear.
Jianqing Fan and Jinchi Lv (2008) Sure Independence Screening for Ultrahigh Dimensional Feature Space (with discussion). Journal of Royal Statistical Society B, 70, 849911.
Jianqing Fan and Rui Song (2010) Sure Independence Screening in Generalized Linear Models with NPDimensionality. The Annals of Statistics, 38, 35673604.
Jianqing Fan, Richard Samworth, and Yichao Wu (2009) Ultrahigh Dimensional Feature Selection: Beyond the Linear Model. Journal of Machine Learning Research, 10, 20132038.
Jianqing Fan, Yang Feng, and Yichao Wu (2010) Highdimensional Variable Selection for Cox Proportional Hazards Model. IMS Collections, 6, 7086.
Jianqing Fan, Yang Feng, and Rui Song (2011) Nonparametric Independence Screening in Sparse Ultrahigh Dimensional Additive Models. Journal of the American Statistical Association, 106, 544557.
Jiahua Chen and Zehua Chen (2008) Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika, 95, 759771.
See Also
predict.SIS
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56  set.seed(0)
n = 400; p = 50; rho = 0.5
corrmat = diag(rep(1rho, p)) + matrix(rho, p, p)
corrmat[,4] = sqrt(rho)
corrmat[4, ] = sqrt(rho)
corrmat[4,4] = 1
corrmat[,5] = 0
corrmat[5, ] = 0
corrmat[5,5] = 1
cholmat = chol(corrmat)
x = matrix(rnorm(n*p, mean=0, sd=1), n, p)
x = x%*%cholmat
# gaussian response
set.seed(1)
b = c(4,4,4,6*sqrt(2),4/3)
y=x[, 1:5]%*%b + rnorm(n)
model11=SIS(x, y, family='gaussian', tune='bic')
model12=SIS(x, y, family='gaussian', tune='bic', varISIS='aggr', seed=11)
model11$ix
model12$ix
# binary response
set.seed(2)
feta = x[, 1:5]%*%b; fprob = exp(feta)/(1+exp(feta))
y = rbinom(n, 1, fprob)
model21=SIS(x, y, family='binomial', tune='bic')
model22=SIS(x, y, family='binomial', tune='bic', varISIS='aggr', seed=21)
model21$ix
model22$ix
# poisson response
set.seed(3)
b = c(0.6,0.6,0.6,0.9*sqrt(2))
myrates = exp(x[, 1:4]%*%b)
y = rpois(n, myrates)
model31=SIS(x, y, family='poisson', tune='bic', perm=TRUE, q=0.9,
greedy=TRUE, seed=31)
#model32=SIS(x, y, family='poisson', tune='bic', varISIS='aggr',
# perm=TRUE, q=0.9, seed=32)
model31$ix
#model32$ix
# Cox model
#set.seed(4)
#b = c(4,4,4,6*sqrt(2),4/3)
#myrates = exp(x[, 1:5]%*%b)
#Sur = rexp(n,myrates); CT = rexp(n,0.1)
#Z = pmin(Sur,CT); ind = as.numeric(Sur<=CT)
#y = survival::Surv(Z,ind)
#model41=SIS(x, y, family='cox', penalty='lasso', tune='bic',
# varISIS='aggr', seed=41)
#model42=SIS(x, y, family='cox', penalty='lasso', tune='bic',
# varISIS='cons', seed=41)
#model41$ix
#model42$ix
