MSTweedie: Regularization path for the Multi-source sparse Tweedie model
In fontaine618/MSTweedie: Multi-source Sparse Tweedie Modelling

Description Usage Arguments Details Value Author(s) References See Also Examples

This function fits the sparse Tweedie model on multi-source datasets along a sequence of regularization parameters lambda. The optimization is done by a Fortran95 routine.

MSTweedie(x, y, w, source, rho = 1.5,
      nlambda = 100, lambda.min, lambda, x.normalize = T,
      eps, sr = T, kktstop = F, reg = c("L2", "Linf"),
      alpha = 0, dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars),
      pf = rep(1, nvars), maxit = 10000)

`x`	Either (1) a data frame containing the predictors, the responses (identifying the sources either by different columns in the simultaneous case or via an additionnal index column) and, optionnaly, the observation weigths or (2) a list of matrices containing only the predictors (mostly used internally for cross-validation.)
`y`	Either (1) a single integer identifying the column of `x` containing the response (requires `source` to be specified), (2) a vector of integers indentifying which columns of `x` are the responses (simultaneous case) or (3) a list of vector of responses (mostly used internally for cross-validation.)
`w`	(Optional) Either (1) a single integer identifying the column of `x` containing the observation weights or (2) a list of vector of weights (mostly used internally for cross-validation.) If this argument is missing, equal weight is assumed.
`source`	When `y` is a single integer, this arguments identifies the column of `x` which indexes the different sources. Disregard is `y` is a vector or list of vectors.
`rho`	Power used for the mean-variance relation of the Tweedie distribution. Possible range is [1,2], default is 1.5.
`nlambda`	The length of the regularization path. Disregarded if `lambda` is specified, default if 100.
`lambda.min`	The fraction of the first regularization parameter (which is computed to be the smallest such that no predictors are included) defining the last regularization parameter. Disregarded if `lambda` is specified; possible range is (0,1), default is 1e-3.
`lambda`	(Optional) User specified sequence of regularization parameter with positive values. When omitted, the sequence is computed starting from the smallest value excluding all predictors from the model and decreasing to a fraction `lambda.min` of that starting value by logarithmic decreaments.
`x.normalize`	Logical flag for stadardization of the predictors prior to fitting the model. If `TRUE`, each predictors in each source is centered to zero and scaled to variance 1. After the fit of the model, the coefficients are returned on the original scale. Default is `FALSE`.
`eps`	Convergence threshold. Default is 1e-3.
`sr`	Logical flag for using the strong rule in the fit. Default is `TRUE`.
`kktstop`	Logical flag for using the KKT conditions to stop the fit before the end of the regularization parameter sequence. Default is `FALSE`.
`reg`	Either `"Linf"` for using L_∞-regularization in the fit or `"L2"` for the L_2-regularization. Default is `"Linf"`.
`alpha`	Parameter controlling the balance between across-feature and within-feature sparsity in the penalty term (1-α)\|\|β\|\|_q +α\|\|β\|\|_1. Possible range is [0,1], default is 0.
`dfmax`	Maximum number of variables included in the model at a single time. Default is `nvars+1`.
`pmax`	Limits the number of features ever to be nonzero. The difference with `dfmax`, is that if, a variable eventually exits the model, it will still be counted here. Default is `min(dfmax*1.2,nvars)`.
`pf`	Penalty weights in the penalty term by feature. Mostly used intternaly when the Adaptive Lasso is used in cross-validation. Expects a vector of length `nvars`, default is 1.
`maxit`	Maximum number of inner-loop iterations. Default is 10,000.

The sequence of regularization parameters implies a sequence of models fitted by the IRLS-BSUM algorithm described in the reference. For each value of the parameter, this function yield a model optimizing the penalzed Tweedie log-likelihood of multi-source data. The type of sparsity can be controlled by the arguments reg and alpha.

The computation time is influence by the arguments eps, nlambda, lambda.min (or lambda) and maxit. Consider ajusting these parameters to speed up computation. Small values of regularization parameters are the often the longest to fit; the kktstop argument can stop the algorithm before the end if convergence is judged sufficient in term of KKT conditions.

To pass sources with missing features compared to other sources, simply add a column of zero instead.

An object with S3 class MSTweedie :

`beta0`	A `ntaks*nlambda` matrix of parameter estimates for the intercept.
`beta`	A list of length `nlambda` containing `nvars*ntaks` matrix of parameter estimates for the features.
`df`	The number of included variables along the regularization path.
`lambda`	The sequence of regularization parameters.
`npasses`	The number of inner-loop iterations.
`idvars`	The index of the variables in order of inclusion in the model.
`dim`	The dimesions of the model (`nvars,ntasks`).
`call`	The original call that produce this object.
`pf`	The penalty factors for the features.
`eps`	The convergence threshold used in the algorithm.
`kkt`	A `nvarsntasksnlambda` array containing the values of the KKT conditions.
`norm`	A `nvars*nlambda` matrix containing the norm of the features along the regularization path.
`reg`	The type of regularization used in the algorithm.
`alpha`	The value of the argument `alpha` used.
`y`	A list of length `ntasks` containing the vectors of the responses for each source.
`x`	A list of length `ntasks` containing matrices of the features for each source.
`w`	A list of length `ntasks` containing the vectors of the observation weights for each source.
`rho`	The power of the mean-variance relation used in the algorithm.
`M`	A `nvarsntasksnlambda` array containing flags for the KKT conditions.
`time`	Computing time.

Simon Fontaine, Yi Yang, Bo Fan, Wei Qian and Yuwen Gu.

Maintainer: Simon Fontaine fontaines@dms.umontreal.ca

Fontaine, S., Yang, Y., Fan, B., Qian, W. and Gu, Y. (2018). "A Unified Approach to Sparse Tweedie Model with Big Data Applications to Multi-Source Insurance Claim Data Analysis," to be submitted.

MSTweedie, coef.MSTweedie, print.MSTweedie, plot.MSTweedie, kkt.check, predict.MSTweedie

# import package
library(MSTweedie)

# load data
data(AutoClaim)

# fit the MSTweedie model with L1/Linf regularization
# y=1 sets CLM_AMT5 as the response
# source=4 sets REVOLKED as the source index
fit <- MSTweedie(x = AutoClaim, y=1, source=4, reg='Linf')