Description Usage Arguments Details Value Author(s) References See Also Examples
Computes the screening criterion values for each group.
1 2 3 |
X |
A matrix of grouped predictors. |
y |
A numeric vector of response. |
group |
A vector of group indices for each predictor. Numeric and consecutive group indices are recommended. |
criterion |
The group screening criterion. The default is |
family |
A description of the error distribution and link function to be used
in the model. The default is |
scale |
The type of scaling of the predictors. The default is " |
norm |
The type of norm for " |
In the group screening procedure, we first have to calculate the values which measure the strength of relationship between entire predictors of each group and response. These values can be used to screen out the important grouped variables (equivalently, remove the unimportant grouped variables) so that we can reduce the dimension of data from high or ultra-high to moderate or even small one.
In greater details, let X = (x_{11},x_{12},...,x_{1p_1},...,x_{j1},x_{j2},..., x_{jp_j},...,x_{J1},x_{J2},...,x_{Jp_J}) be the grouped predictors, where J is the number of groups and p_j is the number of predictors in the j-th group.
For the case in which
family = "gaussian"
, four approaches are applied to calculate
such criterion values.
The first criterion is "gSIS
" that is the grouped version of sure
independence screening [SIS, Fan and Lv (2008)] and defined as
\hat{w} = X^{T}y = (w_{11},w_{12},...,w_{1p_1},...,w_{j1},w_{j2},..., w_{jp_j},...,w_{J1},w_{J2},...,w_{Jp_J}).
Then we take the norm of the vector (w_{j1},w_{j2},..., w_{jp_j}) from the j-th group divided by its size p_j, defined as W_j and thus we obtain the criterion values for the whole groups defined as
\hat{W} = (W_1,...,W_J).
The details of norm
type can be seen in
norm_vec
.
The second criterion is "gHOLP
" that is a grouped version of High-dimensional
Ordinary Least-squares Projector [HOLP, Wang and Leng (2015)] and defined as
\hat{β} = X^{T}(XX^{T})^{-1}y = (β_{11},β_{12},...,β_{1p_1},..., β_{j1},β_{j2},...,β_{jp_j},...,β_{J1},β_{J2},...,β_{Jp_J})
and then we proceed the same way as "gSIS
" to incorporate the group structure.
The third criterion is "gAR2
" which is called groupwise adjusted r.squared. The
basic idea is that we fit a linear model for each group separately and compute the
adjusted r.squared that measures the correlation between each group and response. Note
that in order to calculate the adjusted r.squared, the maximum group size
\max(p_j),j=1,...,J should not be larger than sample size n.
The last criterion is "gDC
" which is called grouped distance correlation.
The distance correlation [Szekely, Rizzo and Bakirov (2007)] measures the dependence
between two random variables or two random vectors.
Thus, similar to the idea of "gAR2
", we compute the distance correlation between
each group and response. It is worthwhile pointing out that distance correlation can not only
measure the linear relationship, but also nonlinear relationship. However, it may take
longer time in computation due to the three steps of calculating distance correlation.
The distance correlation has been applied to screen the individual variables
as in Li, Zhong and Zhu (2012).
For the case in which family = "binomial"
and family = "poisson"
, a different
screening criterion is used for computing the relationship between response and
predictors in each group. To measure the strength of relationship between predictors and
response, the Akaike's Information Criterion (AIC) is utilized and defined as
AIC = -2*LogLikelihood + 2*npar,
where LogLikelihood is the log-likelihood for a fitted generalized linear model, and npar is the number of parameters in the fitted model. In this case, npar is the number of variables within each group, i.e., npar = p_j, j = 1,...,J.
Note that the individual "SIS", "HOLP" can be regarded as a special case of "gSIS
",
and "gHOLP
" when each group has only one predictor.
A numeric matrix with two columns: the first column is the group index, and the second column is the grouped screening criterion values corresponding to the first column.
Debin Qiu, Jeongyoun Ahn
Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society B, 70, 849-911.
Li, R., Zhong,W., and Zhu, L. (2012). Feature screening via distance correlation learning. Journal of American Statistical Association, 107, 1129-1139.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), Measuring and Testing Dependence by Correlation of Distances, Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
Wang, X. and Leng, C. (2015). High-dimensional Ordinary Least-squares Projector for screening variables.Journal of the Royal Statistical Society: Series B. To appear.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library(MASS)
n <- 30 # sample size
p <- 3 # number of predictors in each group
J <- 50 # number of groups
group <- rep(1:J,each = 3) # group indices
Sigma <- diag(p*J) # covariance matrix
X <- mvrnorm(n,seq(0,5,length.out = p*J),Sigma)
beta <- runif(12,-2,5) # coefficients
y <- X%*%matrix(c(beta,rep(0,p*J-12)),ncol = 1) + rnorm(n)
grp.criValues(X,y,group) # gSIS
grp.criValues(X,y,group,"gHOLP") # gHOLP
grp.criValues(X,y,group,"gAR2") # gAR2
grp.criValues(X,y,group,"gDC") # gDC
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.