cv.panel.sglfit: Cross-validation fit for panel sg-LASSO

View source: R/cv.panel.sglfit.R

cv.panel.sglfitR Documentation

Cross-validation fit for panel sg-LASSO

Description

Does k-fold cross-validation for panel data sg-LASSO regression model.

The function runs sglfit nfolds+1 times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted. The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min and lam.1se. Solutions are computed for a fixed γ

Usage

cv.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10, 
  foldid, method = c("pooled", "fe"), nf = NULL, parallel = FALSE, ...)

Arguments

x

NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors.

y

NT by 1 response variable.

lambda

a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on nlambda and γ lambda.factor. It is better to supply, if necessary, a decreasing sequence of lambda values than a single (small) value, as warm-starts are used in the optimization algorithm. The program will ensure that the user-supplied lambda sequence is sorted in decreasing order before fitting the model.

gamma

sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

nfolds

number of folds of the cv loop. Default set to 10.

foldid

the fold assignments used.

method

choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects nf for method = 'fe', and it is recommended to do so for method = 'pooled'. Program uses supplied nf to construct foldsid. Default is set to method = 'pooled'.

nf

number of fixed effects. Used only if method = 'fe'.

parallel

if TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. See the example below.

...

Other arguments that can be passed to sglfit.

Details

The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is (case method='pooled')

||y - ια - xβ||2NT + 2λ Ωγ(β),

where ι∈RNT and α is common intercept to all N items or (case method='fe')

||y - Bα - xβ||2NT + 2λ Ωγ(β),

where B = IN⊗ι and ||u||2NT=<u,u>/NT is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is

Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,

a convex combination of LASSO and group LASSO penalty functions.

Value

cv.panel.sglfit object.

Author(s)

Jonas Striaukas

Examples

 
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE)
## Not run:  
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE))
system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE, parallel = TRUE))

## End(Not run)  


jstriaukas/midasml documentation built on Oct. 5, 2022, 12:18 a.m.