glmsmurf: Fit a Multi-Type Regularized GLM Using the SMuRF Algorithm
In BavoDC/ssmurf: Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)

Description Usage Arguments Details Value References See Also Examples

SMuRF algorithm to fit a generalized linear model (GLM) with multiple types of predictors via regularized maximum likelihood. glmsmurf.fit contains the fitting function for a given design matrix.

glmsmurf(
  formula,
  family,
  data,
  weights,
  start,
  offset,
  lambda,
  lambda1 = 0,
  lambda2 = 0,
  pen.weights,
  adj.matrix,
  standardize = TRUE,
  control = list(),
  x.return = FALSE,
  y.return = TRUE,
  pen.weights.return = FALSE
)

glmsmurf.fit(
  X,
  y,
  weights,
  start,
  offset,
  family,
  pen.cov,
  n.par.cov,
  group.cov,
  refcat.cov,
  lambda,
  lambda1 = 0,
  lambda2 = 0,
  pen.weights,
  adj.matrix,
  standardize = TRUE,
  control = list(),
  formula = NULL,
  data = NULL,
  x.return = FALSE,
  y.return = FALSE,
  pen.weights.return = FALSE
)

`formula`	A `formula` object describing the model to be fitted. Penalties are specified using the `p` function. For `glmsmurf.fit` this is an optional argument which is only used when penalty weights are computed using a generalized additive model (GAM).
`family`	A `family` object specifying the error distribution and link function for the model.
`data`	A data frame containing the model response and predictors for `n` observations.
`weights`	An optional vector of prior weights to use in the likelihood. It should be a numeric vector of length `n` (the number of observations), or `NULL`. When `NULL` or nothing is given, equal prior weights (all ones) will be used.
`start`	A vector containing the starting values for the coefficients. It should either be a numeric vector of length `p+1` (with `p` the number of parameters excluding the intercept) or `NULL`. In the latter case, the link function applied to the weighted average of the response vector is used as starting value for the intercept and zero for the other coefficients.
`offset`	A vector containing the offset for the model. It should be a vector of size `n` or NULL (no offset). Offset(s) specified using the `formula` object will be ignored!
`lambda`	Either the penalty parameter, a positive number; or a string describing the method and measure used to select the penalty parameter: `"is.aic"` (in-sample; Akaike Information Criterion (AIC)), `"is.bic"` (in-sample; Bayesian Information Criterion (BIC)), `"is.gcv"` (in-sample; Generalized Cross-Validation (GCV) score), `"oos.dev"` (out-of-sample; deviance), `"oos.mse"` (out-of-sample; Mean Squared Error (MSE)), `"oos.dss"` (out-of-sample; Dawid-Sebastiani Score (DSS)), `"cv.dev"` (cross-validation (CV); deviance), `"cv.mse"` (CV; MSE), `"cv.dss"` (CV; DSS), `"cv1se.dev"` (CV with one standard error (SE) rule; deviance), `"cv1se.mse"` (CV with one SE rule; MSE), `"cv1se.dss"` (CV with one SE rule; DSS). E.g. `"is.aic"` indicates in-sample selection of lambda with the AIC as measure. When `lambda` is missing or `NULL`, it will be selected using cross-validation with the one standard error rule and the deviance as measure (`"cv1se.dev"`).
`lambda1`	The penalty parameter for the L_1-penalty in Sparse (Generalized) Fused Lasso or Sparse Graph-Guided Fused Lasso is λ \times λ_1. A positive numeric with default 0 (no extra L_1-penalty).
`lambda2`	The penalty parameter for the L_2-penalty in Group (Generalized) Fused Lasso or Group Graph-Guided Fused Lasso is λ \times λ_2. A positive numeric with default 0 (no extra L_2-penalty).
`pen.weights`	Either a string describing the method to compute the penalty weights: `"eq"` (default; equal penalty weights), `"stand"` (standardization penalty weights), `"glm"` (adaptive GLM penalty weights), `"glm.stand"` (stand. ad. GLM penalty weights), `"gam"` (ad. GAM penalty weights), `"gam.stand"` (stand. ad. GAM penalty weights); or a list with the penalty weight vector per predictor. This list should have length equal to the number of predictors and predictor names as element names.
`adj.matrix`	A named list containing the adjacency matrices (a.k.a. neighbor matrices) for each of the predictors with a Graph-Guided Fused Lasso penalty. The list elements should have the names of the corresponding predictors. If only one predictor has a Graph-Guided Fused Lasso penalty, it is also possible to only give the adjacency matrix itself (not in a list).
`standardize`	Logical indicating if predictors with a Lasso or Group Lasso penalty are standardized, default is `TRUE`. The returned coefficients are always on the original (i.e. non-standardized) scale.
`control`	A list of parameters used in the fitting process. This is passed to `glmsmurf.control`.
`x.return`	Logical indicating if the used model matrix should be returned in the output object, default is `FALSE`.
`y.return`	Logical indicating if the used response vector should be returned in the output object, default is `TRUE`.
`pen.weights.return`	Logical indicating if the list of the used penalty weight vector per predictor should be returned in the output object, default is `FALSE`.
`X`	Only for `glmsmurf.fit`: the design matrix including ones for the intercept. A `n` by `(p+1)` matrix which can be of numeric matrix class (`matrix-class`) or of class Matrix (`Matrix-class`) including sparse matrix class (`dgCMatrix-class`).
`y`	Only for `glmsmurf.fit`: the response vector, a numeric vector of size `n`.
`pen.cov`	Only for `glmsmurf.fit`: a list with the penalty type per predictor (covariate). A named list of strings with predictor names as element names. Possible types: `"none"` (no penalty, e.g. for intercept), `"lasso"` (Lasso), `"grouplasso"` (Group Lasso), `"flasso"` (Fused Lasso), `"gflasso"` (Generalized Fused Lasso), `"2dflasso"` (2D Fused Lasso) or `"ggflasso"` (Graph-Guided Fused Lasso).
`n.par.cov`	Only for `glmsmurf.fit`: a list with the number of parameters to estimate per predictor (covariate). A named list of strictly positive integers with predictor names as element names.
`group.cov`	Only for `glmsmurf.fit`: a list with the group of each predictor (covariate) which is only used for the Group Lasso penalty. A named list of positive integers with predictor names as element names where 0 means no group.
`refcat.cov`	Only for `glmsmurf.fit`: a list with the number of the reference category in the original order of the levels of each predictor (covariate). When the predictor is not a factor or no reference category is present, it is equal to 0. This number will only be taken into account for a Fused Lasso, Generalized Fused Lasso or Graph-Guided Fused Lasso penalty when a reference category is present.

See the package vignette for more details and a complete description of a use case.

As a user, it is important to take the following into acocunt:

The estimated coefficients are rounded to 7 digits.
The cross-validation folds are not deterministic. The validation sample for selecting lambda out-of-sample is determined at random when no indices are provided in 'validation.index' in the control object argument. In these cases, the selected value of lambda is hence not deterministic. When selecting lambda in-sample, or out-of-sample when indices are provided in 'validation.index' in the control object argument, the selected value of lambda is deterministic.
The glmsmurf function can handle many use cases and is preferred for general use. The glmsmurf.fit function requires a more thorough understanding of the package internals and should hence be used with care!

An object of class 'glmsmurf' is returned. See glmsmurf-class for more details about this class and its generic functions.

Devriendt, S., Antonio, K., Reynkens, T. and Verbelen, R. (2021). "Sparse Regression with Multi-type Regularized Feature Modeling", Insurance: Mathematics and Economics, 96, 248–261. <doi:10.1016/j.insmatheco.2020.11.010>.

Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press.

glmsmurf-class, glmsmurf.control, p, glm

# Munich rent data from catdata package
data("rent", package = "catdata")

# The considered predictors are the same as in 
# Gertheiss and Tutz (Ann. Appl. Stat., 2010).
# Response is monthly rent per square meter in Euro

# Urban district in Munich
rent$area <- as.factor(rent$area)

# Create formula with 'rentm' as response variable,
# 'area' with a Generalized Fused Lasso penalty,
formu <- rentm ~ p(area, pen = "gflasso")


# Quick comparison with previous package in terms of speed
rbenchmark::benchmark(smurf::glmsmurf(formula = formu, family = gaussian, data = rent, 
                                      pen.weights = "glm.stand", lambda = 5e-3,
                                      control = glmsmurf.control(epsilon = 1e-8, print = F , ncores = 1, po.ncores = 1)),
                      ssmurf::glmsmurf(formula = formu, family = gaussian, data = rent, 
                                       pen.weights = "glm.stand", lambda = 5e-3,
                                       control = glmsmurf.control(epsilon = 1e-8, print = F , ncores = 1, po.ncores = 1)), 
                      replications = 20)


# The following comparison takes a bit longer
l = c(exp(seq(log(64), log(1e-4), length.out = 50)))

# Speed comparison to the previous package
rbenchmark::benchmark(smurf::glmsmurf(formula = formu, family = gaussian, data = rent, 
                                      pen.weights = "glm.stand", lambda = "cv1se.dev",
                                      control = glmsmurf.control(epsilon = 1e-8, print = F , k = 10, ncores = 1, po.ncores = 1,
                                                                 lambda.vector = l)),
                      ssmurf::glmsmurf(formula = formu, family = gaussian, data = rent, 
                                       pen.weights = "glm.stand", lambda = "cv1se.dev",
                                       control = glmsmurf.control(epsilon = 1e-8, print = F, k = 10, ncores = 1, po.ncores = 1,
                                                                  lambda.vector = l)), replications = 10)

BavoDC/ssmurf documentation built on Dec. 17, 2021, 10:46 a.m.

BavoDC/ssmurf index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BavoDC/ssmurf
Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)

glmsmurf: Fit a Multi-Type Regularized GLM Using the SMuRF Algorithm
In BavoDC/ssmurf: Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to glmsmurf in BavoDC/ssmurf...

R Package Documentation

Browse R Packages

We want your feedback!

BavoDC/ssmurf Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)

glmsmurf: Fit a Multi-Type Regularized GLM Using the SMuRF Algorithm In BavoDC/ssmurf: Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to glmsmurf in BavoDC/ssmurf...

R Package Documentation

Browse R Packages

We want your feedback!

BavoDC/ssmurf
Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)

glmsmurf: Fit a Multi-Type Regularized GLM Using the SMuRF Algorithm
In BavoDC/ssmurf: Sparse Multi-Type Regularized Feature Modeling (package for testing purposes)