Functional least angle regression.

Share:

Description

This is the main function for the functional least angle regression algorithm. Under certain conditions, the function only needs the input of two arguments: x and y. This function can do both variable selection and parameter estimation.

Usage

1
2
3
4
flars(x,y,method=c('basis','gq','raw'),max_selection,cv=c('gcv'),
      normalize=c('trace','rank','norm','raw'),lasso=TRUE,check=1,
      select=TRUE,VarThreshold=0.1,SignThreshold=0.8,
      control=list())

Arguments

x

The mixed scalar and functional variables. Note that each of the functional variables is expected to be stored in a matrix. Each row of the matrix should represent a sample or a curve. If there is only one functional variable, x can be a matrix. If there is only scalar variables, x can be a vector or a matrix. If there are more than one functional variables, or there are mixed functional and scalar variables, x should be a list. If x is a list, each item of the list should correspond to one variable.

y

The scalar variable. It can be a matrix or a vector.

method

The representative methods for the functional coefficients. The method could be one of the 'basis', 'gq' and 'raw' for basis function expression, Gaussian quadrature and representative data points, respectively.

max_selection

Number of maximum selections when stopping the algorithm. Set a reasonable number for this argument to increase the calculation speed.

cv

Choise of cross validation. At the moment, the only choice is the generalized cross validation, i.e., cv='gcv'.

lasso

Use lasso modification or not. In other words, can variables selected in the former iterations be removed in the later iterations.

check

Type of check methods for lasso modification. 1 means variance check, 2 means sign check. check=1 is much better than the other one.

select

If TRUE, the aim is to do selection rather than parameter estimation, and the stopping rule can be used when lasso=TRUE. If FALSE, the stopping rule may not work when lasso=TRUE.

VarThreshold

Threshold for removing variables based on variation explained. More specifically, one condition to remove a variable is that the variation explained by a variable is less than VarThreshold*Var(y). To remove this variable, there is another condition: the variation explained by this variable is less than largest variation it explained in the previous iterations.

SignThreshold

This is a similar argument to VarThreshold. If a functional coefficient has less than SignThreshold same as that from the previous iteration, the variable is removed.

normalize

Choice of normalization methods. This is to remove any effects caused by the different dimensions of functional variables and scalar variables. Currently we have trace, rank, norm, raw. norm and raw are recommended.

control

list of control elements for the functional coefficients. See fccaGen for details.

Value

Mu

Estimated intercept from each of the iterations

Beta

Estimated functional coefficients from each of the iterations

alpha

Distance along the directions from each of the iterations

p2_norm

Normalization constant applied to each of the iterations

AllIndex

All the index. If one variable is removed, it will become a negative index.

index

All the index at the end of the selection.

CD

Stopping rule designed for this algorithm. The algorithm should stop when this value is very small. Normally we can observe an obvious and severe drop of the value.

resid

Residual from each of the iteration.

RowMeans

Point-wise mean of the functional variables and mean of the scalar variables.

RowSds

Point-wise sd of the functional variables and sd of the scalar variables.

yMean

Mean of the response variable.

ySD

SD of the response variable.

p0

The projections obtained from each iteration without normalization.

cor1

The maximum correlation obtained from the first iteration.

lasso

Weather have lasso step or not.

df

The degrees of freedom calculated at the end of each iteration.

Sigma2Bar

Estimated $sigma^2$.

StopStat

Conventional stopping criteria.

varSplit

The variation explained by each of the candidate variables at each iteration.

SignCheckF

The proportion of sign changing for each of the candidate variables at each iteration.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
library(flars)
library(fda)
#### Ex1 ####
## Generate some data.
dataL=data_generation(seed = 1,uncorr = TRUE,nVar = 8,nsamples = 120,
    var_type = 'm',cor_type = 3)

## Do the variable selection
out=flars(dataL$x,dataL$y,method='basis',max_selection=9,
    normalize='norm',lasso=FALSE)

## Check the stopping point with CD
plot(2:length(out$alpha),out$CD) # plot the CD with the iteration number

## In simple problems we can try
(iter=which.max(diff(out$CD))+2)


#### Ex2 ####
## Generate some data.
# dataL=data_generation(seed = 1,uncorr = FALSE,nVar = 8,nsamples = 120,
#      var_type = 'm',cor_type = 3)
## add more variables to the candidate
# for(i in 2:4){
# dataL0=data_generation(seed = i,uncorr = FALSE,nVar = 8,nsamples = 120,
#      var_type = 'm',cor_type = 3) 
# dataL$x=c(dataL$x,dataL0$x)
# }
# names(dataL$x)=paste0('v_',seq(length(dataL$x)))

## Do the variable selection
# out=flars(dataL$x,dataL$y,method='basis',max_selection=9,
#     normalize='norm',lasso=FALSE)

#### Ex3 (small subset of a real data set) ####
data(RealDa, package = 'flars')
out=flars(RealDa$x,RealDa$y,method='basis',max_selection=9,
    normalize='norm',lasso=FALSE)
# out=flars(RealDa$x,RealDa$y,method='basis',max_selection=9,
#     normalize='norm',lasso=TRUE)

## Check the stopping point with CD
plot(2:length(out$alpha),out$CD) # plot the CD with the iteration number
## The value drops to very small compare to others at iteration six and 
###  stays low after that, so the algorithm may stop there.