cv.MSTweedie: Cross-Validation on the Multi-sourse sparse Tweedie model
In fontaine618/MSTweedie: Multi-source Sparse Tweedie Modelling

Description Usage Arguments Details Value Author(s) References See Also Examples

This function performs k-fold cross-validation (CV) of the multi-source sparse Tweedie model (MSTweedie) partly based on the glmnet::cv function. The Tweedie model deviance is the statistical criterion for model selection.

cv.MSTweedie(x, y, w, source, rho, nlambda = 100, lambda,
      lambda.min = 0.001, nfolds = 10, kktstop = F, foldid,
      adaptive = 0, x.normalize = TRUE, reg = c("L2", "Linf"),
      eps = 5e-04, sr = TRUE, maxit = 10000,
      pf = rep(1, nvars), alpha = 0, ...)

`x`	A data frame containing the predictors, the responses (identifying the sources either by different columns in the simultaneous case or via an additionnal index column) and, optionnaly, the observation weigths.
`y`	Either (1) a single integer identifying the column of `x` containing the response (requires `source` to be specified), (2) a vector of integers indentifying which columns of `x` are the responses (simultaneous case).
`w`	A single integer identifying the column of `x` containing the observation weights. If this argument is missing, equal weight is assumed.
`source`	When `y` is a single integer, this arguments identifies the column of `x` which indexes the different sources. Disregard is `y` is a vector or list of vectors.
`rho`	Power used for the mean-variance relation of the Tweedie distribution. Possible range is [1,2], default is 1.5.
`nlambda`	The length of the regularization path. Disregarded if `lambda` is specified, default if 100.
`lambda.min`	The fraction of the first regularization parameter (which is computed to be the smallest such that no predictors are included) defining the last regularization parameter. Disregarded if `lambda` is specified; possible range is (0,1), default is 1e-3.
`lambda`	(Optional) User specified sequence of regularization parameter with positive values. When omitted, the sequence is computed starting from the smallest value excluding all predictors from the model and decreasing to a fraction `lambda.min` of that starting value by logarithmic decreaments.
`nfolds`	Number of folds in the k-folds CV. Default is 10.
`kktstop`	Logical flag for using the KKT conditions to stop the fit before the end of the regularization parameter sequence. Default is `FALSE`. Only applies to the preliminary fit.
`foldid`	(Optional) An list of vector of values between 1 and `nfolds` identifying what fold each observation is in. If supplied, `nfolds` can be missing. If missing, the folds are constructed randomly.
`adaptive`	Exponent of the adaptive penalty weights; suggested value is 1 (See reference for details.) When the argument is 0, no adaptation is performed.
`x.normalize`	Logical flag for stadardization of the predictors prior to fitting the model. If `TRUE`, each predictors in each source is centered to zero and scaled to variance 1. After the fit of the model, the coefficients are returned on the original scale. Default is `FALSE`.
`reg`	Either `"Linf"` for using L_∞-regularization in the fit or `"L2"` for the L_2-regularization. Default is `"Linf"`.
`eps`	Convergence threshold. Default is 5e-4.
`sr`	Logical flag for using the strong rule in the fit. Default is `TRUE`.
`maxit`	Maximum number of inner-loop iterations. Default is 10,000.
`pf`	Penalty weights in the penalty term by feature. Mostly used internally when the Adaptive Lasso is used in cross-validation. Expects a vector of length `nvars`, default is 1.
`alpha`	Parameter controlling the balance between across-feature and within-feature sparsity in the penalty term (1-α)\|\|β\|\|_q +α\|\|β\|\|_1. Possible range is [0,1], default is 0.
`...`	Further arguments to be passed to `MSTweedie`.

The function runs the MSTweedie function on the first dataset to get the sequence of regularization paramter and the coefficient estimates. Then, it performs CV along the solution path and computes the out-of-sample Tweedie deviance based on the predicted responses. For each value of the regularization parameter, the error is averaged over the number of folds used and the standard error is computed.

`lambda`	A vector containing the sequence of regularization parameters.
`cvm`	A vector containing the Tweedie deviance mean across the folds along the solution path.
`cvsd`	A vector containing the Tweedie deviance standard error across the folds along the solution path.
`cvupper`	`cvm+cvsd`.
`cvlo`	`cvm-cvsd`.
`reg`	The type of regularization used in the algorithm.
`alpha`	The value of the argument `alpha` used.
`name`	`"Tweedie Deviance"`, the loss function used.
`MSTweedie.fit`	The `MSTweedie` object (fitted on the whole dataset.)
`time`	Computing time.
`lambda.min`	Regularization parameter at minimum CV error.
`lambda.1se`	Largest regularization parameter within one standard error of the minimum.

Simon Fontaine, Yi Yang, Bo Fan, Wei Qian and Yuwen Gu.

Maintainer: Simon Fontaine fontaines@dms.umontreal.ca

Fontaine, S., Yang, Y., Fan, B., Qian, W. and Gu, Y. (2018). "A Unified Approach to Sparse Tweedie Model with Big Data Applications to Multi-Source Insurance Claim Data Analysis," to be submitted.

Friedman, J., Hastie, T., Simon, N., Qian, J. and Tibshirani, R. (2017). "glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models." A vignette for R package glmnet. Available from https://cran.r-project.org/web/packages/glmnet.

glmnet package.

MSTweedie, coef.cv.MSTweedie, plot.cv.MSTweedie, predict.cv.MSTweedie

#import package
library(MSTweedie)

#load data
data(AutoClaim)

# performs 10-folds CV with L1/Linf regularization
cv<-cv.MSTweedie(x = AutoClaim, y=1, source=4, reg='Linf')