Description Usage Arguments Details Value References See Also Examples
Sparse Estimation of a GLM via Minimum approximated Information Criterion
1 2 3 4 5 6 | glmMIC(formula, preselection = NULL, family = c("gaussian", "binomial",
"poisson"), data, beta0 = NULL, preselect.intercept = FALSE,
criterion = "BIC", lambda0 = 0, a0 = NULL, rounding.digits = 4,
use.GenSA = FALSE, lower = NULL, upper = NULL, maxit.global = 100,
maxit.local = 100, epsilon = 1e-06, se.gamma = TRUE, CI.gamma = TRUE,
conf.level = 0.95, se.beta = TRUE, fit.ML = FALSE, details = FALSE)
|
formula |
An object of class |
preselection |
A formula of form, e.g., |
family |
a description of the error distribution and link function to be used in the model. Preferably for computational speed,
this is a character string naming a family function among the following three choices: |
data |
A data.frame in which to interpret the variables named in the |
beta0 |
User-supplied beta0 value, the starting point for optimization. If missing or |
preselect.intercept |
A logical value indicating whether the intercept term is pre-selected. By default, it is |
criterion |
Specifies the model selection criterion used. If |
lambda0 |
User-supplied penalty parameter for model complexity. If |
a0 |
The scale (or sharpness) parameter used in the hyperbolic tangent penalty. By default, |
rounding.digits |
Number of digits after the decimal point for rounding-up estiamtes. Default value is 4. |
use.GenSA |
Logical value indicating if the generalized simulated annealing |
lower |
The lower bounds for the search space in |
upper |
The upper bounds for the search space in |
maxit.global |
Maximum number of iterations allowed for the global optimization algorithm |
maxit.local |
Maximum number of iterations allowed for the local optimizaiton algorithm |
epsilon |
Tolerance level for convergence. Default is 1e-6. |
se.gamma |
Logical indicator of whether the standard error for |
CI.gamma |
Logical indicator of whether the confidence inverval for |
conf.level |
Specifies the confidence level for |
se.beta |
Logical indicator of whether the (post-selection) standard error for |
fit.ML |
Logical indicator of whether we fit the best selected model with full iteration of maximum likelihood (ML). Default is |
details |
Logical value: if |
The main idea of MIC involves approximation of the l0 norm with a continuous or smooth unit dent function. This method bridges the best subset selection and regularization by borrowing strength from both. It mimics the best subset selection using a penalized likelihood approach yet with no need of a tuning parameter.
The problem is further reformulated with a reparameterization step by relating beta
to gamma
. There are two benefits of doing so: first, it reduces the optimization to
one unconstrained nonconvex yet smooth programming problem, which can be solved efficiently
as in computing the maximum likelihood estimator (MLE); furthermore, the
reparameterization tactic yields an additional advantage in terms of circumventing post-selection inference.
Significance testing on beta
can be done through gamma
.
To solve the smooth yet nonconvex optimization, two options are available. The first is a simulated annealing (method="SANN"
option
in optim
) global optimization algorithm is first applied. The resultant estimator is then used
as the starting point for another local optimization algorithm, where the quasi-Newton BFGS method (method="BFGS"
in optim
) by default. Optionally, the generalized simulated annealing, implemented in GenSA
,
can be used instead. This latter approach tends to be slower. However, it does not need to be combined with another local optimization;
besides, it often yields the same final solution with different runs. Thus, when use.GenSA=TRUE
,
the output includes opt.global
only, without opt.local
.
In its current version, some appropriate data preparation might be needed. Most important of all, X variables in all scenarios need to be standardized or scaled. In the case of Gaussian linear regression, the response variable needs to be centered or even standardized. In addition, missing values would cause errors too and hence need prehanlding too.
An object of class glmMIC
is returned, which may contain the following components depending on the options.
Results from the preliminary run of a global optimization procedure (SANN
as default).
Results from the second run of a local optimization procedure (BFGS
as default).
Value of the minimized objective function.
Estimated gamma (reparameterized);
Estimated beta;
The estimated variance-covariance matrix for the (reparameterized) gamma estimate;
Standard errors for the gamma estimate;
The estimated variance-covariance matrix for the beat estimate;
Standard errors for the beta estimate (post-selection);
The BIC value for the selected model;
A summary table of the fitting results;
The glm
fitting results with the selected model with full ML iterations;
the matched call.
Su, X. (2015). Variable selection via subtle uprooting. Journal of Computational and Graphical Statistics, 24(4): 1092–1113. URL http://www.tandfonline.com/doi/pdf/10.1080/10618600.2014.955176
Su, X., Fan, J., Levine, R. A., Nunn, M. E., and Tsai, C.-L. (2016+). Sparse estimation of generalized linear models via approximated information criteria. Submitted, Statistica Sinica.
glm
, print.glmMIC
, plot.glmMIC
,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # Note that glmMIC works with standardized data only. See below for examples.
# GAUSSIAN LINEAR REGRESSION
library(lars); data(diabetes);
dat <- cbind(diabetes$x, y=diabetes$y)
dat <- as.data.frame(scale(dat))
fit.MIC <- glmMIC(formula=y~.-1, family="gaussian", data=dat)
names(fit.MIC)
print(fit.MIC)
plot(fit.MIC)
# WITH PRE-SELECTED VARIABLES AND A DIFFERENT a VALUE
fit.MIC <- glmMIC(formula=y~.-1, preselection=~age+sex, family="gaussian", a0=20, data=dat)
fit.MIC
# LOGISTIC REGRESSION
library(ncvreg); data(heart)
dat <- as.data.frame(cbind(scale(heart[, -10]), chd=heart$chd)); names(dat)
dat <- dat[, -c(4, 6)]; head(dat)
fit.MIC <- glmMIC(formula= chd~., data=dat, family = "binomial")
fit.MIC
# LOGLINEAR REGRESSION
fish <- read.csv("http://www.ats.ucla.edu/stat/data/fish.csv")
form <- count ~ . -1 + xb:zg
y <- fish[,names(fish)==as.character(form)[2]]
X <- model.matrix(as.formula(form),fish)
dat <- data.frame(scale(X), count=fish$count); head(dat)
fit.MIC <- glmMIC(formula= count~ ., family = "poisson", data=dat)
fit.MIC
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.