cv.MSTweedie: Cross-Validation on the Multi-sourse sparse Tweedie model

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function performs k-fold cross-validation (CV) of the multi-source sparse Tweedie model (MSTweedie) partly based on the glmnet::cv function. The Tweedie model deviance is the statistical criterion for model selection.

Usage

1
2
3
4
5
cv.MSTweedie(x, y, w, source, rho, nlambda = 100, lambda,
      lambda.min = 0.001, nfolds = 10, kktstop = F, foldid,
      adaptive = 0, x.normalize = TRUE, reg = c("L2", "Linf"),
      eps = 5e-04, sr = TRUE, maxit = 10000,
      pf = rep(1, nvars), alpha = 0, ...)

Arguments

x

A data frame containing the predictors, the responses (identifying the sources either by different columns in the simultaneous case or via an additionnal index column) and, optionnaly, the observation weigths.

y

Either (1) a single integer identifying the column of x containing the response (requires source to be specified), (2) a vector of integers indentifying which columns of x are the responses (simultaneous case).

w

A single integer identifying the column of x containing the observation weights. If this argument is missing, equal weight is assumed.

source

When y is a single integer, this arguments identifies the column of x which indexes the different sources. Disregard is y is a vector or list of vectors.

rho

Power used for the mean-variance relation of the Tweedie distribution. Possible range is [1,2], default is 1.5.

nlambda

The length of the regularization path. Disregarded if lambda is specified, default if 100.

lambda.min

The fraction of the first regularization parameter (which is computed to be the smallest such that no predictors are included) defining the last regularization parameter. Disregarded if lambda is specified; possible range is (0,1), default is 1e-3.

lambda

(Optional) User specified sequence of regularization parameter with positive values. When omitted, the sequence is computed starting from the smallest value excluding all predictors from the model and decreasing to a fraction lambda.min of that starting value by logarithmic decreaments.

nfolds

Number of folds in the k-folds CV. Default is 10.

kktstop

Logical flag for using the KKT conditions to stop the fit before the end of the regularization parameter sequence. Default is FALSE. Only applies to the preliminary fit.

foldid

(Optional) An list of vector of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing. If missing, the folds are constructed randomly.

adaptive

Exponent of the adaptive penalty weights; suggested value is 1 (See reference for details.) When the argument is 0, no adaptation is performed.

x.normalize

Logical flag for stadardization of the predictors prior to fitting the model. If TRUE, each predictors in each source is centered to zero and scaled to variance 1. After the fit of the model, the coefficients are returned on the original scale. Default is FALSE.

reg

Either "Linf" for using L_∞-regularization in the fit or "L2" for the L_2-regularization. Default is "Linf".

eps

Convergence threshold. Default is 5e-4.

sr

Logical flag for using the strong rule in the fit. Default is TRUE.

maxit

Maximum number of inner-loop iterations. Default is 10,000.

pf

Penalty weights in the penalty term by feature. Mostly used internally when the Adaptive Lasso is used in cross-validation. Expects a vector of length nvars, default is 1.

alpha

Parameter controlling the balance between across-feature and within-feature sparsity in the penalty term

(1-α)||β||_q +α||β||_1.

Possible range is [0,1], default is 0.

...

Further arguments to be passed to MSTweedie.

Details

The function runs the MSTweedie function on the first dataset to get the sequence of regularization paramter and the coefficient estimates. Then, it performs CV along the solution path and computes the out-of-sample Tweedie deviance based on the predicted responses. For each value of the regularization parameter, the error is averaged over the number of folds used and the standard error is computed.

Value

lambda

A vector containing the sequence of regularization parameters.

cvm

A vector containing the Tweedie deviance mean across the folds along the solution path.

cvsd

A vector containing the Tweedie deviance standard error across the folds along the solution path.

cvupper

cvm+cvsd.

cvlo

cvm-cvsd.

reg

The type of regularization used in the algorithm.

alpha

The value of the argument alpha used.

name

"Tweedie Deviance", the loss function used.

MSTweedie.fit

The MSTweedie object (fitted on the whole dataset.)

time

Computing time.

lambda.min

Regularization parameter at minimum CV error.

lambda.1se

Largest regularization parameter within one standard error of the minimum.

Author(s)

Simon Fontaine, Yi Yang, Bo Fan, Wei Qian and Yuwen Gu.

Maintainer: Simon Fontaine fontaines@dms.umontreal.ca

References

Fontaine, S., Yang, Y., Fan, B., Qian, W. and Gu, Y. (2018). "A Unified Approach to Sparse Tweedie Model with Big Data Applications to Multi-Source Insurance Claim Data Analysis," to be submitted.

Friedman, J., Hastie, T., Simon, N., Qian, J. and Tibshirani, R. (2017). "glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models." A vignette for R package glmnet. Available from https://cran.r-project.org/web/packages/glmnet.

glmnet package.

See Also

MSTweedie, coef.cv.MSTweedie, plot.cv.MSTweedie, predict.cv.MSTweedie

Examples

1
2
3
4
5
6
7
8
#import package
library(MSTweedie)

#load data
data(AutoClaim)

# performs 10-folds CV with L1/Linf regularization
cv<-cv.MSTweedie(x = AutoClaim, y=1, source=4, reg='Linf')

fontaine618/MSTweedie documentation built on May 25, 2019, 5:22 p.m.