Variable Importance Vignette"
In BranchGLM: Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

A paper that describes the variable importance measures in more detail should be available soon.

Variable Importance

Definition

L0-penalization based modified variable importance is defined in the following way

$$mVI(i|X, y, \lambda) = \min_{\beta:\beta_i \neq 0} Q(\beta|X, y, \lambda) - \min_{\beta:\beta_i = 0} Q(\beta|X, y, \lambda) + \lambda |S_i|$$ where $Q(\beta|X, y, \lambda) = -2l(\beta|X, y) + \lambda ||\beta||_0$, $||\beta||_0$ is the number of nonzero elements in $\beta$, and $||S_i||$ is the number of beta parameters associated with the ith set of variables. The number of parameters in the ith set of variables is 1 for continuous variables and is the number of levels minus 1 for categorical variables. $\lambda$ is defined by the chosen metric where AIC results in $\lambda = 2$, BIC results in $\lambda = \log{(n)}$, and HQIC results in $\lambda = 2\log{(\log{(n)})}$.

These variable importance values are equivalent to the traditional likelihood ratio test for beta parameters when $\lambda = 0$. However, when $\lambda > 0$, the null distribution of the variable importance values may not be chi-squared distributed. P-values for the variable importance values may be obtained from the VariableImportance.boot() function which uses a parametric bootstrap approach to approximate the null distribution. This process entails performing best subset selection many times over, so it is quite slow.

Variable importance example

L0-penalization based variable importance values may be calculated with the VariableImportance() function. The VariableImportance() function requires an object returned from calling the VariableSelection() function. The exact variable importance values are returned if a branch and bound algorithm is used with the VariableSelection() function. If a heuristic method is used with the VariableSelection() function, then approximate variable importance values based on the specified heuristic method are returned.

# Loading BranchGLM package
library(BranchGLM)

# Using iris dataset to demonstrate usage of VI
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")

# Doing branch and bound selection 
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", 
showprogress = FALSE)

# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI

We can visualize the variable importance values with the barplot() function.

# Plotting variable importance
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
barplot(VI)
par(mar = oldmar)

P-values

We can get approximate p-values based on the L0-penalization based variable importance values from the VariableImportance.boot() function. This function uses a parametric bootstrap approach to create an approximate null distribution for the variable importance values. This approach is very slow, so it is not feasible to get these p-values when there are many sets of variables.

# Getting approximate null distributions
set.seed(59903)
myBoot <- VariableImportance.boot(VI, nboot = 1000, showprogress = FALSE)
myBoot

We can visualize the results from VariableImportance.boot() with the hist() function or the boxplot() function. The boxplot() approach is convenient because we can look at all of the results in one plot while the hist() approach only contains the results for one set of variables in each plot.

# Plotting histogram of results for second set of variables
hist(myBoot)

# Plotting boxplots of results
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
boxplot(myBoot, las = 1)
par(mar = oldmar)

Any scripts or data that you put into this service are public.

BranchGLM documentation built on Sept. 28, 2024, 9:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BranchGLM
Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Variable Importance Vignette"
In BranchGLM: Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Variable Importance

Definition

Variable importance example

P-values

Try the BranchGLM package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BranchGLM Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Variable Importance Vignette" In BranchGLM: Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Variable Importance

Definition

Variable importance example

P-values

Try the BranchGLM package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BranchGLM
Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Variable Importance Vignette"
In BranchGLM: Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms