glmMIC: Sparse Estimation of a GLM via Minimum approximated...

Description Usage Arguments Details Value References See Also Examples

View source: R/glmMIC.R

Description

Sparse Estimation of a GLM via Minimum approximated Information Criterion

Usage

1
2
3
4
5
6
glmMIC(formula, preselection = NULL, family = c("gaussian", "binomial",
  "poisson"), data, beta0 = NULL, preselect.intercept = FALSE,
  criterion = "BIC", lambda0 = 0, a0 = NULL, rounding.digits = 4,
  use.GenSA = FALSE, lower = NULL, upper = NULL, maxit.global = 100,
  maxit.local = 100, epsilon = 1e-06, se.gamma = TRUE, CI.gamma = TRUE,
  conf.level = 0.95, se.beta = TRUE, fit.ML = FALSE, details = FALSE)

Arguments

formula

An object of class formula, with the response on the left of a ~ operator, and the terms on the right.

preselection

A formula of form, e.g., ~ x1 + x2 that gives the pre-selected variables. In this case, no penalty will be applied to their slope parameters. These variables must be contained in the formula argument; otherwise, an error message shows up.

family

a description of the error distribution and link function to be used in the model. Preferably for computational speed, this is a character string naming a family function among the following three choices: "gaussian", "binomial", or "poisson". Otherwise, it has to be a family function or the result of a call to a family function that can be called for by glm.fit. See family for details of family functions.

data

A data.frame in which to interpret the variables named in the formula argument.

beta0

User-supplied beta0 value, the starting point for optimization. If missing or NULL (by default), the maximum likelihood estimator (MLE) will be used.

preselect.intercept

A logical value indicating whether the intercept term is pre-selected. By default, it is FALSE.

criterion

Specifies the model selection criterion used. If "AIC", the complexity penalty parameter (lambda) equals 2; if "BIC", lambda equals ln(n), where n is the sample size. You may specify the penalty parameter of your choice by setting lambda0.

lambda0

User-supplied penalty parameter for model complexity. If method="AIC" or "BIC", the value of lambda0 will be ignored.

a0

The scale (or sharpness) parameter used in the hyperbolic tangent penalty. By default, a0=min(n, 100) is used.

rounding.digits

Number of digits after the decimal point for rounding-up estiamtes. Default value is 4.

use.GenSA

Logical value indicating if the generalized simulated annealing GenSA is used. The default is FALSE.

lower

The lower bounds for the search space in GenSA. The default is -10 (p by 1 vector).

upper

The upper bounds for the search space in GenSA. The default is +10 (p by 1 vector).

maxit.global

Maximum number of iterations allowed for the global optimization algorithm SANN. Default value is 100.

maxit.local

Maximum number of iterations allowed for the local optimizaiton algorithm BFGS. Default value is 100.

epsilon

Tolerance level for convergence. Default is 1e-6.

se.gamma

Logical indicator of whether the standard error for gamma is computed. Default is TRUE.

CI.gamma

Logical indicator of whether the confidence inverval for gamma is outputed. Default is TRUE.

conf.level

Specifies the confidence level for CI.gamma. Defaulted as 0.95.

se.beta

Logical indicator of whether the (post-selection) standard error for beta is computed. Default is TRUE.

fit.ML

Logical indicator of whether we fit the best selected model with full iteration of maximum likelihood (ML). Default is FALSE.

details

Logical value: if TRUE, detailed results will be printed out when running coxphMIC.

Details

The main idea of MIC involves approximation of the l0 norm with a continuous or smooth unit dent function. This method bridges the best subset selection and regularization by borrowing strength from both. It mimics the best subset selection using a penalized likelihood approach yet with no need of a tuning parameter.

The problem is further reformulated with a reparameterization step by relating beta to gamma. There are two benefits of doing so: first, it reduces the optimization to one unconstrained nonconvex yet smooth programming problem, which can be solved efficiently as in computing the maximum likelihood estimator (MLE); furthermore, the reparameterization tactic yields an additional advantage in terms of circumventing post-selection inference. Significance testing on beta can be done through gamma.

To solve the smooth yet nonconvex optimization, two options are available. The first is a simulated annealing (method="SANN" option in optim) global optimization algorithm is first applied. The resultant estimator is then used as the starting point for another local optimization algorithm, where the quasi-Newton BFGS method (method="BFGS" in optim) by default. Optionally, the generalized simulated annealing, implemented in GenSA, can be used instead. This latter approach tends to be slower. However, it does not need to be combined with another local optimization; besides, it often yields the same final solution with different runs. Thus, when use.GenSA=TRUE, the output includes opt.global only, without opt.local.

In its current version, some appropriate data preparation might be needed. Most important of all, X variables in all scenarios need to be standardized or scaled. In the case of Gaussian linear regression, the response variable needs to be centered or even standardized. In addition, missing values would cause errors too and hence need prehanlding too.

Value

An object of class glmMIC is returned, which may contain the following components depending on the options.

opt.global

Results from the preliminary run of a global optimization procedure (SANN as default).

opt.local

Results from the second run of a local optimization procedure (BFGS as default).

min.Q

Value of the minimized objective function.

gamma

Estimated gamma (reparameterized);

beta

Estimated beta;

VCOV.gamma

The estimated variance-covariance matrix for the (reparameterized) gamma estimate;

se.gamma

Standard errors for the gamma estimate;

VCOV.beta

The estimated variance-covariance matrix for the beat estimate;

se.beta

Standard errors for the beta estimate (post-selection);

BIC

The BIC value for the selected model;

result

A summary table of the fitting results;

fit.ML

The glm fitting results with the selected model with full ML iterations;

call

the matched call.

References

See Also

glm, print.glmMIC, plot.glmMIC,

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  # Note that glmMIC works with standardized data only. See below for examples.
  # GAUSSIAN LINEAR REGRESSION
  library(lars); data(diabetes);
  dat <- cbind(diabetes$x, y=diabetes$y)
  dat <- as.data.frame(scale(dat))
  fit.MIC <- glmMIC(formula=y~.-1, family="gaussian", data=dat)
  names(fit.MIC)
  print(fit.MIC)
  plot(fit.MIC)
  # WITH PRE-SELECTED VARIABLES AND A DIFFERENT a VALUE
  fit.MIC <- glmMIC(formula=y~.-1, preselection=~age+sex, family="gaussian", a0=20, data=dat)
  fit.MIC

  # LOGISTIC REGRESSION
  library(ncvreg); data(heart)
  dat <- as.data.frame(cbind(scale(heart[, -10]), chd=heart$chd)); names(dat)
  dat <- dat[, -c(4, 6)]; head(dat)
  fit.MIC <- glmMIC(formula= chd~., data=dat, family = "binomial")
  fit.MIC

 # LOGLINEAR REGRESSION
 fish <- read.csv("http://www.ats.ucla.edu/stat/data/fish.csv")
 form <- count ~ . -1 + xb:zg
 y <- fish[,names(fish)==as.character(form)[2]]
 X <- model.matrix(as.formula(form),fish)
 dat <- data.frame(scale(X), count=fish$count); head(dat)
 fit.MIC <- glmMIC(formula= count~ ., family = "poisson", data=dat)
 fit.MIC

xgsu/glmMIC documentation built on May 4, 2019, 1:06 p.m.