Home

/

GitHub

/

xgsu/glmMIC

/

glmMIC: Sparse Estimation of a GLM via Minimum approximated...

glmMIC: Sparse Estimation of a GLM via Minimum approximated...
In xgsu/glmMIC: Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)

Description Usage Arguments Details Value References See Also Examples

View source: R/glmMIC.R

Sparse Estimation of a GLM via Minimum approximated Information Criterion

glmMIC(formula, preselection = NULL, family = c("gaussian", "binomial",
  "poisson"), data, beta0 = NULL, preselect.intercept = FALSE,
  criterion = "BIC", lambda0 = 0, a0 = NULL, rounding.digits = 4,
  use.GenSA = FALSE, lower = NULL, upper = NULL, maxit.global = 100,
  maxit.local = 100, epsilon = 1e-06, se.gamma = TRUE, CI.gamma = TRUE,
  conf.level = 0.95, se.beta = TRUE, fit.ML = FALSE, details = FALSE)

`formula`	An object of class `formula`, with the response on the left of a `~` operator, and the terms on the right.
`preselection`	A formula of form, e.g., `~ x1 + x2` that gives the pre-selected variables. In this case, no penalty will be applied to their slope parameters. These variables must be contained in the `formula` argument; otherwise, an error message shows up.
`family`	a description of the error distribution and link function to be used in the model. Preferably for computational speed, this is a character string naming a family function among the following three choices: `"gaussian"`, `"binomial"`, or `"poisson"`. Otherwise, it has to be a family function or the result of a call to a family function that can be called for by `glm.fit`. See `family` for details of family functions.
`data`	A data.frame in which to interpret the variables named in the `formula` argument.
`beta0`	User-supplied beta0 value, the starting point for optimization. If missing or `NULL` (by default), the maximum likelihood estimator (MLE) will be used.
`preselect.intercept`	A logical value indicating whether the intercept term is pre-selected. By default, it is `FALSE`.
`criterion`	Specifies the model selection criterion used. If `"AIC"`, the complexity penalty parameter (lambda) equals 2; if `"BIC"`, lambda equals ln(n), where n is the sample size. You may specify the penalty parameter of your choice by setting `lambda0`.
`lambda0`	User-supplied penalty parameter for model complexity. If `method="AIC"` or `"BIC"`, the value of `lambda0` will be ignored.
`a0`	The scale (or sharpness) parameter used in the hyperbolic tangent penalty. By default, `a0=min(n, 100)` is used.
`rounding.digits`	Number of digits after the decimal point for rounding-up estiamtes. Default value is 4.
`use.GenSA`	Logical value indicating if the generalized simulated annealing `GenSA` is used. The default is `FALSE`.
`lower`	The lower bounds for the search space in `GenSA`. The default is -10 (p by 1 vector).
`upper`	The upper bounds for the search space in `GenSA`. The default is +10 (p by 1 vector).
`maxit.global`	Maximum number of iterations allowed for the global optimization algorithm `SANN`. Default value is 100.
`maxit.local`	Maximum number of iterations allowed for the local optimizaiton algorithm `BFGS`. Default value is 100.
`epsilon`	Tolerance level for convergence. Default is 1e-6.
`se.gamma`	Logical indicator of whether the standard error for `gamma` is computed. Default is `TRUE`.
`CI.gamma`	Logical indicator of whether the confidence inverval for `gamma` is outputed. Default is `TRUE`.
`conf.level`	Specifies the confidence level for `CI.gamma`. Defaulted as 0.95.
`se.beta`	Logical indicator of whether the (post-selection) standard error for `beta` is computed. Default is `TRUE`.
`fit.ML`	Logical indicator of whether we fit the best selected model with full iteration of maximum likelihood (ML). Default is `FALSE`.
`details`	Logical value: if `TRUE`, detailed results will be printed out when running `coxphMIC`.

The main idea of MIC involves approximation of the l0 norm with a continuous or smooth unit dent function. This method bridges the best subset selection and regularization by borrowing strength from both. It mimics the best subset selection using a penalized likelihood approach yet with no need of a tuning parameter.

The problem is further reformulated with a reparameterization step by relating beta to gamma. There are two benefits of doing so: first, it reduces the optimization to one unconstrained nonconvex yet smooth programming problem, which can be solved efficiently as in computing the maximum likelihood estimator (MLE); furthermore, the reparameterization tactic yields an additional advantage in terms of circumventing post-selection inference. Significance testing on beta can be done through gamma.

To solve the smooth yet nonconvex optimization, two options are available. The first is a simulated annealing (method="SANN" option in optim) global optimization algorithm is first applied. The resultant estimator is then used as the starting point for another local optimization algorithm, where the quasi-Newton BFGS method (method="BFGS" in optim) by default. Optionally, the generalized simulated annealing, implemented in GenSA, can be used instead. This latter approach tends to be slower. However, it does not need to be combined with another local optimization; besides, it often yields the same final solution with different runs. Thus, when use.GenSA=TRUE, the output includes opt.global only, without opt.local.

In its current version, some appropriate data preparation might be needed. Most important of all, X variables in all scenarios need to be standardized or scaled. In the case of Gaussian linear regression, the response variable needs to be centered or even standardized. In addition, missing values would cause errors too and hence need prehanlding too.

An object of class glmMIC is returned, which may contain the following components depending on the options.

opt.global: Results from the preliminary run of a global optimization procedure (SANN as default).
opt.local: Results from the second run of a local optimization procedure (BFGS as default).
min.Q: Value of the minimized objective function.
gamma: Estimated gamma (reparameterized);
beta: Estimated beta;
VCOV.gamma: The estimated variance-covariance matrix for the (reparameterized) gamma estimate;
se.gamma: Standard errors for the gamma estimate;
VCOV.beta: The estimated variance-covariance matrix for the beat estimate;
se.beta: Standard errors for the beta estimate (post-selection);
BIC: The BIC value for the selected model;
result: A summary table of the fitting results;
fit.ML: The glm fitting results with the selected model with full ML iterations;
call: the matched call.

Su, X. (2015). Variable selection via subtle uprooting. Journal of Computational and Graphical Statistics, 24(4): 1092–1113. URL http://www.tandfonline.com/doi/pdf/10.1080/10618600.2014.955176
Su, X., Fan, J., Levine, R. A., Nunn, M. E., and Tsai, C.-L. (2016+). Sparse estimation of generalized linear models via approximated information criteria. Submitted, Statistica Sinica.

glm, print.glmMIC, plot.glmMIC,

  # Note that glmMIC works with standardized data only. See below for examples.
  # GAUSSIAN LINEAR REGRESSION
  library(lars); data(diabetes);
  dat <- cbind(diabetes$x, y=diabetes$y)
  dat <- as.data.frame(scale(dat))
  fit.MIC <- glmMIC(formula=y~.-1, family="gaussian", data=dat)
  names(fit.MIC)
  print(fit.MIC)
  plot(fit.MIC)
  # WITH PRE-SELECTED VARIABLES AND A DIFFERENT a VALUE
  fit.MIC <- glmMIC(formula=y~.-1, preselection=~age+sex, family="gaussian", a0=20, data=dat)
  fit.MIC

  # LOGISTIC REGRESSION
  library(ncvreg); data(heart)
  dat <- as.data.frame(cbind(scale(heart[, -10]), chd=heart$chd)); names(dat)
  dat <- dat[, -c(4, 6)]; head(dat)
  fit.MIC <- glmMIC(formula= chd~., data=dat, family = "binomial")
  fit.MIC

 # LOGLINEAR REGRESSION
 fish <- read.csv("http://www.ats.ucla.edu/stat/data/fish.csv")
 form <- count ~ . -1 + xb:zg
 y <- fish[,names(fish)==as.character(form)[2]]
 X <- model.matrix(as.formula(form),fish)
 dat <- data.frame(scale(X), count=fish$count); head(dat)
 fit.MIC <- glmMIC(formula= count~ ., family = "poisson", data=dat)
 fit.MIC

xgsu/glmMIC documentation built on May 4, 2019, 1:06 p.m.

xgsu/glmMIC index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

xgsu/glmMIC
Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)

glmMIC: Sparse Estimation of a GLM via Minimum approximated...
In xgsu/glmMIC: Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to glmMIC in xgsu/glmMIC...

R Package Documentation

Browse R Packages

We want your feedback!

xgsu/glmMIC Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)

glmMIC: Sparse Estimation of a GLM via Minimum approximated... In xgsu/glmMIC: Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to glmMIC in xgsu/glmMIC...

R Package Documentation

Browse R Packages

We want your feedback!

xgsu/glmMIC
Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)

glmMIC: Sparse Estimation of a GLM via Minimum approximated...
In xgsu/glmMIC: Sparse Estimation of Generalized Linear Models (GLM) via Minimum approximated Inforamtion Criterion (MIC)