# flamcv: Fit the Fused Lasso Additive Model and Do Tuning Parameter... In flam: Fits Piecewise Constant Models with Data-Adaptive Knots

## Description

Fit an additive model where each component is estimated to piecewise constant with a small number of adaptively-chosen knots. Tuning parameter selection is done using K-fold cross-validation. In particular, this function implements the "fused lasso additive model", as proposed in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

## Usage

 ```1 2 3``` ```flamCV(x, y, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL, alpha = 1, family = "gaussian", method = "BCD", fold = NULL, n.fold = NULL, seed = NULL, within1SE = T, tolerance = 10e-6) ```

## Arguments

 `x` n x p covariate matrix. May have p > n. `y` n-vector containing the outcomes for the n observations in `x`. `lambda.min.ratio` smallest value for `lambda.seq`, as a fraction of the maximum lambda value, which is the data-derived smallest value for which all estimated functions are zero. The default is 0.01. `n.lambda` the number of lambda values to consider - the default is 50. `lambda.seq` a user-supplied sequence of positive lambda values to consider. The typical usage is to calculate `lambda.seq` using `lambda.min.ratio` and `n.lambda`, but providing `lambda.seq` overrides this. If provided, `lambda.seq` should be a decreasing sequence of values, since `flamCV` relies on warm starts for speed. Thus fitting the model for a whole sequence of lambda values is often faster than fitting for a single lambda value. `alpha` the value of the tuning parameter alpha to consider - default is 1. Value must be in [0,1] with values near 0 prioritizing sparsity of functions and values near 1 prioritizing limiting the number of knots. Empirical evidence suggests using alpha of 1 when p < n and alpha of 0.75 when p > n. `family` specifies the loss function to use. Currently supports squared error loss (default; `family="gaussian"`) and logistic loss (`family="binomial"`). `method` specifies the optimization algorithm to use. Options are block-coordinate descent (default; `method="BCD"`), generalized gradient descent (`method="GGD"`), or generalized gradient descent with backtracking (`method="GGD.backtrack"`). This argument is ignored if `family="binomial"`. `fold` user-supplied fold numbers for cross-validation. If supplied, `fold` should be an n-vector with entries in 1,...,K when doing K-fold cross-validation. The default is to choose `fold` using `n.fold`. `n.fold` the number of folds, K, to use for the K-fold cross-validation selection of tuning parameters. The default is 10 - specification of `fold` overrides use of `n.fold`. `seed` an optional number used with `set.seed()` at the beginning of the function. This is only relevant if `fold` is not specified by the user. `within1SE` logical (`TRUE` or `FALSE`) for how cross-validated tuning parameters should be chosen. If `within1SE=TRUE`, lambda is chosen to be the value corresponding to the most sparse model with cross-validation error within one standard error of the minimum cross-validation error. If `within1SE=FALSE`, lambda is chosen to be the value corresponding to the minimum cross-validation error. `tolerance` specifies the convergence criterion for the objective (default is 10e-6).

## Details

Note that `flamCV` does not cross-validate over `alpha` - just a single value should be provided. However, if the user would like to cross-validate over `alpha`, then `flamCV` should be called multiple times for different values of `alpha` and the same `seed`. This ensures that the cross-validation folds (`fold`) remain the same for the different values of `alpha`. See the example below for details.

## Value

An object with S3 class "flamCV".

 `mean.cv.error` m-vector containing cross-validation error where m is the length of `lambda.seq`. Note that `mean.cv.error[i]` contains the cross-validation error for tuning parameters `alpha` and `flam.out\$all.lambda[i]`. `se.cv.error` m-vector containing cross-validation standard error where m is the length of `lambda.seq`. Note that `se.cv.error[i]` contains the standard error of the cross-validation error for tuning parameters `alpha` and `flam.out\$all.lambda[i]`. `lambda.cv` optimal lambda value chosen by cross-validation. `alpha` as specified by user (or default). `index.cv` index of the model corresponding to 'lambda.cv'. `flam.out` object of class 'flam' returned by `flam`. `fold` as specified by user (or default). `n.folds` as specified by user (or default). `within1SE` as specified by user (or default). `tolerance` as specified by user (or default). `call` matched call.

Ashley Petersen

## References

Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

`flam`, `plot.flamCV`, `summary.flamCV`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42``` ```#See ?'flam-package' for a full example of how to use this package #generate data set.seed(1) data <- sim.data(n = 50, scenario = 1, zerof = 10, noise = 1) #fit model for a range of lambda chosen by default #pick lambda using 2-fold cross-validation #note: use larger 'n.fold' (e.g., 10) in practice flamCV.out <- flamCV(x = data\$x, y = data\$y, alpha = 0.75, n.fold = 2) ## Not run: #note that cross-validation is only done to choose lambda for specified alpha #to cross-validate over alpha also, call 'flamCV' for several alpha and set seed #note: use larger 'n.fold' (e.g., 10) in practice flamCV.out1 <- flamCV(x = data\$x, y = data\$y, alpha = 0.65, seed = 100, within1SE = FALSE, n.fold = 2) flamCV.out2 <- flamCV(x = data\$x, y = data\$y, alpha = 0.75, seed = 100, within1SE = FALSE, n.fold = 2) flamCV.out3 <- flamCV(x = data\$x, y = data\$y, alpha = 0.85, seed = 100, within1SE = FALSE, n.fold = 2) #this ensures that the folds used are the same flamCV.out1\$fold; flamCV.out2\$fold; flamCV.out3\$fold #compare the CV error for the optimum lambda of each alpha to choose alpha CVerrors <- c(flamCV.out1\$mean.cv.error[flamCV.out1\$index.cv], flamCV.out2\$mean.cv.error[flamCV.out2\$index.cv], flamCV.out3\$mean.cv.error[flamCV.out3\$index.cv]) best.alpha <- c(flamCV.out1\$alpha, flamCV.out2\$alpha, flamCV.out3\$alpha)[which(CVerrors==min(CVerrors))] #also can generate data for logistic FLAM model data2 <- sim.data(n = 50, scenario = 1, zerof = 10, family = "binomial") #fit the FLAM model with cross-validation using logistic loss #note: use larger 'n.fold' (e.g., 10) in practice flamCV.logistic.out <- flamCV(x = data2\$x, y = data2\$y, family = "binomial", n.fold = 2) ## End(Not run) #'flamCV' returns an object of the class 'flamCV' that includes an object #of class 'flam' (flam.out); see ?'flam-package' for an example using S3 #methods for the classes of 'flam' and 'flamCV' ```