smle-package | R Documentation |
Feature screening is a powerful tool in processing ultrahigh dimensional data. It attempts to screen out most irrelevant features in preparation for a more elaborate analysis. This package provides an efficient implementation of SMLE-screening for linear, logistic, and Poisson models, where the joint effects among features are naturally incorporated in the screening process. The package also provides a function for conducting accurate post-screening feature selection based on an iterative hard-thresholding procedure and a user-specified selection criterion.
Package: | smle |
Type: | Package |
Version: | 2.1-0 |
Date: | 2023-01-04 |
License: | GPL-3 |
Input a n \times 1 response vector Y and a n \times p predictor (feature) matrix X. The package outputs a set of k < n features that seem to be most relevant for joint regression. Moreover, the package provides a data simulator that generates synthetic datasets from high-dimensional GLMs, which accommodate both numerical and categorical features with commonly used correlation structures.
Key functions:
Gen_Data
SMLE
smle_select
Qianxiang Zang, Chen Xu, Kelly Burkett
Maintainer: Qianxiang Zang <qzang023@uottawa.ca>
Xu, C. and Chen, J. (2014)
The Sparse MLE for Ultrahigh-Dimensional Feature Screening
Journal of the American Statistical Association, 109(507), 1257–1269.
Friedman, J., Hastie, T. and Tibshirani, R. (2010)
Regularization Paths for Generalized Linear Models via Coordinate
Descent
Journal of Statistical Software, 33(1), 1-22.
set.seed(1) #Generate correlated data Data <- Gen_Data(n = 200, p = 5000, correlation = "MA",family = "gaussian") print(Data) # joint feature screening via SMLE fit <- SMLE(Y = Data$Y, X = Data$X, k = 10, family = "gaussian") print(fit) summary(fit) plot(fit) #Are there any features missed after screening? setdiff(Data$subset_true, fit$ID_retained) # Elaborative selection after screening fit_s <- smle_select(fit, gamma_ebic = 0.5, vote = FALSE) #Are there any features missed after selection? setdiff(Data$subset_true, fit_s$ID_selected) print(fit_s) summary(fit_s) plot(fit_s)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.