flamcv: Fit the Fused Lasso Additive Model and Do Tuning Parameter...
In flam: Fits Piecewise Constant Models with Data-Adaptive Knots

Description Usage Arguments Details Value Author(s) References See Also Examples

Fit an additive model where each component is estimated to piecewise constant with a small number of adaptively-chosen knots. Tuning parameter selection is done using K-fold cross-validation. In particular, this function implements the "fused lasso additive model", as proposed in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

1
2
3

flamCV(x, y, lambda.min.ratio = 0.01, n.lambda = 50, lambda.seq = NULL,
alpha = 1, family = "gaussian", method = "BCD", fold = NULL,
n.fold = NULL, seed = NULL, within1SE = T, tolerance = 10e-6)

`x`	n x p covariate matrix. May have p > n.
`y`	n-vector containing the outcomes for the n observations in `x`.
`lambda.min.ratio`	smallest value for `lambda.seq`, as a fraction of the maximum lambda value, which is the data-derived smallest value for which all estimated functions are zero. The default is 0.01.
`n.lambda`	the number of lambda values to consider - the default is 50.
`lambda.seq`	a user-supplied sequence of positive lambda values to consider. The typical usage is to calculate `lambda.seq` using `lambda.min.ratio` and `n.lambda`, but providing `lambda.seq` overrides this. If provided, `lambda.seq` should be a decreasing sequence of values, since `flamCV` relies on warm starts for speed. Thus fitting the model for a whole sequence of lambda values is often faster than fitting for a single lambda value.
`alpha`	the value of the tuning parameter alpha to consider - default is 1. Value must be in [0,1] with values near 0 prioritizing sparsity of functions and values near 1 prioritizing limiting the number of knots. Empirical evidence suggests using alpha of 1 when p < n and alpha of 0.75 when p > n.
`family`	specifies the loss function to use. Currently supports squared error loss (default; `family="gaussian"`) and logistic loss (`family="binomial"`).
`method`	specifies the optimization algorithm to use. Options are block-coordinate descent (default; `method="BCD"`), generalized gradient descent (`method="GGD"`), or generalized gradient descent with backtracking (`method="GGD.backtrack"`). This argument is ignored if `family="binomial"`.
`fold`	user-supplied fold numbers for cross-validation. If supplied, `fold` should be an n-vector with entries in 1,...,K when doing K-fold cross-validation. The default is to choose `fold` using `n.fold`.
`n.fold`	the number of folds, K, to use for the K-fold cross-validation selection of tuning parameters. The default is 10 - specification of `fold` overrides use of `n.fold`.
`seed`	an optional number used with `set.seed()` at the beginning of the function. This is only relevant if `fold` is not specified by the user.
`within1SE`	logical (`TRUE` or `FALSE`) for how cross-validated tuning parameters should be chosen. If `within1SE=TRUE`, lambda is chosen to be the value corresponding to the most sparse model with cross-validation error within one standard error of the minimum cross-validation error. If `within1SE=FALSE`, lambda is chosen to be the value corresponding to the minimum cross-validation error.
`tolerance`	specifies the convergence criterion for the objective (default is 10e-6).

Note that flamCV does not cross-validate over alpha - just a single value should be provided. However, if the user would like to cross-validate over alpha, then flamCV should be called multiple times for different values of alpha and the same seed. This ensures that the cross-validation folds (fold) remain the same for the different values of alpha. See the example below for details.

An object with S3 class "flamCV".

`mean.cv.error`	m-vector containing cross-validation error where m is the length of `lambda.seq`. Note that `mean.cv.error[i]` contains the cross-validation error for tuning parameters `alpha` and `flam.out$all.lambda[i]`.
`se.cv.error`	m-vector containing cross-validation standard error where m is the length of `lambda.seq`. Note that `se.cv.error[i]` contains the standard error of the cross-validation error for tuning parameters `alpha` and `flam.out$all.lambda[i]`.
`lambda.cv`	optimal lambda value chosen by cross-validation.
`alpha`	as specified by user (or default).
`index.cv`	index of the model corresponding to 'lambda.cv'.
`flam.out`	object of class 'flam' returned by `flam`.
`fold`	as specified by user (or default).
`n.folds`	as specified by user (or default).
`within1SE`	as specified by user (or default).
`tolerance`	as specified by user (or default).
`call`	matched call.

Ashley Petersen

Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

flam, plot.flamCV, summary.flamCV

#See ?'flam-package' for a full example of how to use this package

#generate data
set.seed(1)
data <- sim.data(n = 50, scenario = 1, zerof = 10, noise = 1)

#fit model for a range of lambda chosen by default
#pick lambda using 2-fold cross-validation
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.out <- flamCV(x = data$x, y = data$y, alpha = 0.75, n.fold = 2)

## Not run: 
#note that cross-validation is only done to choose lambda for specified alpha
#to cross-validate over alpha also, call 'flamCV' for several alpha and set seed
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.out1 <- flamCV(x = data$x, y = data$y, alpha = 0.65, seed = 100, 
	within1SE = FALSE, n.fold = 2)
flamCV.out2 <- flamCV(x = data$x, y = data$y, alpha = 0.75, seed = 100, 
	within1SE = FALSE, n.fold = 2)
flamCV.out3 <- flamCV(x = data$x, y = data$y, alpha = 0.85, seed = 100, 
	within1SE = FALSE, n.fold = 2)
#this ensures that the folds used are the same
flamCV.out1$fold; flamCV.out2$fold; flamCV.out3$fold
#compare the CV error for the optimum lambda of each alpha to choose alpha
CVerrors <- c(flamCV.out1$mean.cv.error[flamCV.out1$index.cv], 
	flamCV.out2$mean.cv.error[flamCV.out2$index.cv], 
	flamCV.out3$mean.cv.error[flamCV.out3$index.cv])
best.alpha <- c(flamCV.out1$alpha, flamCV.out2$alpha, 
	flamCV.out3$alpha)[which(CVerrors==min(CVerrors))]

#also can generate data for logistic FLAM model
data2 <- sim.data(n = 50, scenario = 1, zerof = 10, family = "binomial")
#fit the FLAM model with cross-validation using logistic loss
#note: use larger 'n.fold' (e.g., 10) in practice
flamCV.logistic.out <- flamCV(x = data2$x, y = data2$y, family = "binomial",
	n.fold = 2)

## End(Not run)

#'flamCV' returns an object of the class 'flamCV' that includes an object
#of class 'flam' (flam.out); see ?'flam-package' for an example using S3
#methods for the classes of 'flam' and 'flamCV'

[1] "See example in '?sim.data' for code to plot generating functions."
[1] "fold: 1"
[1] "fold: 2"
[1] "fold: 1"
[1] "fold: 2"
[1] "fold: 1"
[1] "fold: 2"
[1] "fold: 1"
[1] "fold: 2"
 [1] 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 2 1 1 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 1
[39] 2 1 1 2 2 1 1 1 2 2 1 2
 [1] 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 2 1 1 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 1
[39] 2 1 1 2 2 1 1 1 2 2 1 2
 [1] 2 2 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 2 1 1 1 2 2 1 1 2 1 1 1 2 1 1 1 1 1 1
[39] 2 1 1 2 2 1 1 1 2 2 1 2
[1] "See example in '?sim.data' for code to plot generating functions."
[1] "fold: 1"
[1] "fold: 2"