LassoSIR: LassoSIR In LassoSIR: Sparsed Sliced Inverse Regression via Lasso

Description

This function calculates the sufficient dimension reduction (SDR) space using the Sparse Sliced Inverse Regression Via Lasso (Lasso-SIR).

The input is a continuous design matrix X and a response vector Y which can be either continuous or categorical. X is arranged such that each column corresponds to one variable and each row corresponds to one subject.

The function gives users options to choose (i) the dimension of the SDR space, (ii) screening based on the diagonal thresholding, (iii) the number of slices (H), and many others.

Usage

 1 2 LassoSIR(X, Y, H = 0, choosing.d = "automatic", solution.path = FALSE, categorical = FALSE, nfolds = 10, screening = TRUE, no.dim = 0)

Arguments

 X This argument is the continuous design matrix X. X is arranged such that each column corresponds to one variable and each row corresponds to one subject. Y The response vector Y, which can be either continuous or categorical. H The number of slices. (i) If the boolean variable "categorical" is true, H is chosen as the number of categories automatically. (ii) If the response variable is continuous, namely, "categorical" is false, user need to specify the number of slices. If H is set as 0, the code will ask the user to enter the number of slices interactively; (iii) the default choice of H is zero. choosing.d This argument asks for the method of choosing the dimension of SDR. If no.dim is non zero, then choosing.d is set as "given". Otherwise, choosing.d can be set as "automatic" or "manual". When choosing.d is set as "manual", this function will calculate the eigenvalues of var(EX|Y) and plot these eigenvalues. After that, the user will be asked to enter the dimension interactively. When choosing.d is set as "automatic", the dimension will be determined automatically according to Algorithm 5 from the original paper. The default option is "automatic". solution.path When setting this boolean variable as TRUE, a plot with solution path based on the final proposed model will be plotted. The default option is FALSE. categorical When setting this boolean variable as TRUE, the response variable is categorical; otherwise, the response variable is continuous. The default option is FALSE. nfolds This argument set the number of folds in the cross validation. The default option is 10. screening When setting this boolean variable as TRUE, a diagonal thresholding (DT-SIR) step is applied to reduce the dimension before applying Lasso-SIR. no.dim This argument specifies the dimension of SDR. The default option is 0 and this dimension is chosen manually or automatically based on the choice of choosing.d.

Details

This function estimates the sufficient dimension reduction space using the sparse sliced inverse regression for high dimensional data via Lasso (LassoSIR).

Value

When solution.path is set as true, the function returns a glmnet object.

When solution.path is set as false, the tuning parameter in Lasso is chosen by using the cross validation. The function returns the following values:

 beta the estimated coefficient in SDR. eigen.value the eigen value of the estimator of var(EY|X). no.dim the dimension of the central space. H the number of slices. categorical a boolean variable to indicate the type of the response.

NA

Author(s)

Zhigen Zhao, Qian Lin, Jun S. Liu

References

Lin, Q., Zhao, Z. , and Liu, J. (2017) On consistency and sparsity for sliced inverse regression in high dimension. Annals of Statistics.

Lin, Q., Zhao, Z. , and Liu, J. (2016) Sparse Sliced Inverse Regression for High Dimensional Data.