aglm-package: aglm: Accurate Generalized Linear Model
In aglm: Accurate Generalized Linear Model

Description Details Fitting functions Using the fitted model Other functions Author(s) References

Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020).

The collection of functions provided by the aglm package has almost the same structure as the famous glmnet package, so users familiar with the glmnet package will be able to handle it easily. In fact, this structure is reasonable in implementation, because what the aglm package does is applying appropriate transformations to the given data and passing it to the glmnet package as a backend.

The aglm package provides three different fitting functions, depending on how users want to handle hyper-parameters of AGLM models.

Because AGLM is based on regularized GLM, the regularization term of the loss function can be expressed as follows: \loadmathjax \mjsdeqn R(\lbrace \beta_jk \rbrace; \lambda, \alpha) = \lambda \left\lbrace (1 - \alpha)\sum_j=1^p \sum_k=1^m_j|\beta_jk|^2 + \alpha \sum_j=1^p \sum_k=1^m_j |\beta_jk| \right\rbrace, where β_jk is the k-th coefficient of auxiliary variables for the j-th column in data, α is a weight which controls how L1 and L2 regularization terms are mixed, and λ determines the strength of the regularization.

Searching hyper-parameters α and λ is often useful to get better results, but usually time-consuming. That's why the aglm package provides three fitting functions with different strategies for specifying hyper-parameters as follows:

aglm: A basic fitting function with given α and λ (s).
cv.aglm: A fitting function with given α and cross-validation for λ.
cva.aglm: A fitting function with cross-validation for both α and λ.

Generally speaking, setting an appropriate λ is often important to get meaningful results, and using cv.aglm() with default α=1 (LASSO) is usually enough. Since cva.aglm() is much time-consuming than cv.aglm(), it is better to use it only if particularly better results are needed.

The following S4 classes are defined to store results of the fitting functions.

AccurateGLM-class: A class for results of aglm() and cv.aglm()
CVA_AccurateGLM-class: A class for results of cva.aglm()

Users can use models obtained from fitting functions in various ways, by passing them to following functions:

predict: Make predictions for new data
plot: Plot contribution of each variable and residuals
print: Display textual information of the model
coef: Get coefficients
deviance: Get deviance
residuals: Get residuals of various types

We emphasize that plot() is particularly useful to understand the fitted model, because it presents a visual representation of how variables in the original data are used by the model.

The following functions are basically for internal use, but exported as utility functions for convenience.

Functions for creating feature vectors
- getUDummyMatForOneVec
- getODummyMatForOneVec
- getLVarMatForOneVec
Functions for binning
- createEqualWidthBins
- createEqualFreqBins
- executeBinning

Kenji Kondo,
Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)

Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020) AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques,
https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1
Actuarial Colloquium Paris 2020

aglm documentation built on June 9, 2021, 5:08 p.m.