tscv.sglfit: Time series cross-validation fit for sg-LASSO

View source: R/tscv.sglfit.R

tscv.sglfitR Documentation

Time series cross-validation fit for sg-LASSO

Description

Does k-fold time series cross-validation for sg-LASSO regression model.

The function runs sglfit nfolds+1 times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted. The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min and lam.1se. Solutions are computed for a fixed γ

Usage

tscv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, 
  K = 20, l = 5, parallel = FALSE, seed = NULL, ...)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

y

T by 1 response variable.

lambda

a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on nlambda and γ lambda.factor. It is better to supply, if necessary, a decreasing sequence of lambda values than a single (small) value, as warm-starts are used in the optimization algorithm. The program will ensure that the user-supplied lambda sequence is sorted in decreasing order before fitting the model.

gamma

sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

K

number of folds of the cv loop. Default set to 20.

l

the gap used to drop observations round test set data. For each test observation (in total K), we drop 2l observations such that the test observation at time t is separated by l observations. Default set to 5.

parallel

if TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. See the example below.

seed

set a value for seed to control results replication, i.e. set.seed(seed) is used. seed is stored in the output list. Default set to as.numeric(Sys.Date()).

...

Other arguments that can be passed to sglfit.

Details

The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is

||y - ια - xβ||2T + 2λ Ωγ(β),

where ι∈RTenter> and ||u||2T=<u,u>/T is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is

Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,

a convex combination of LASSO and group LASSO penalty functions.

Value

tscv.sglfit object.

Author(s)

Jonas Striaukas

Examples

 
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE)
## Not run:  
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE))
system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE, parallel = TRUE))

## End(Not run)


jstriaukas/midasml documentation built on Oct. 5, 2022, 12:18 a.m.