flars: Functional least angle regression.
In flars: Functional LARS

Description Usage Arguments Value Examples

This is the main function for the functional least angle regression algorithm. Under certain conditions, the function only needs the input of two arguments: x and y. This function can do both variable selection and parameter estimation.

flars(x,y,method=c('basis','gq','raw'),max_selection,cv=c('gcv'),
      normalize=c('trace','rank','norm','raw'),lasso=TRUE,check=1,
      select=TRUE,VarThreshold=0.1,SignThreshold=0.8,
      control=list())

`x`	The mixed scalar and functional variables. Note that each of the functional variables is expected to be stored in a matrix. Each row of the matrix should represent a sample or a curve. If there is only one functional variable, `x` can be a matrix. If there is only scalar variables, `x` can be a vector or a matrix. If there are more than one functional variables, or there are mixed functional and scalar variables, `x` should be a list. If `x` is a list, each item of the list should correspond to one variable.
`y`	The scalar variable. It can be a matrix or a vector.
`method`	The representative methods for the functional coefficients. The method could be one of the 'basis', 'gq' and 'raw' for basis function expression, Gaussian quadrature and representative data points, respectively.
`max_selection`	Number of maximum selections when stopping the algorithm. Set a reasonable number for this argument to increase the calculation speed.
`cv`	Choise of cross validation. At the moment, the only choice is the generalized cross validation, i.e., `cv='gcv'`.
`lasso`	Use lasso modification or not. In other words, can variables selected in the former iterations be removed in the later iterations.
`check`	Type of check methods for lasso modification. 1 means variance check, 2 means sign check. `check=1` is much better than the other one.
`select`	If `TRUE`, the aim is to do selection rather than parameter estimation, and the stopping rule can be used when `lasso=TRUE`. If `FALSE`, the stopping rule may not work when `lasso=TRUE`.
`VarThreshold`	Threshold for removing variables based on variation explained. More specifically, one condition to remove a variable is that the variation explained by a variable is less than `VarThreshold*Var(y)`. To remove this variable, there is another condition: the variation explained by this variable is less than largest variation it explained in the previous iterations.
`SignThreshold`	This is a similar argument to `VarThreshold`. If a functional coefficient has less than `SignThreshold` same as that from the previous iteration, the variable is removed.
`normalize`	Choice of normalization methods. This is to remove any effects caused by the different dimensions of functional variables and scalar variables. Currently we have `trace`, `rank`, `norm`, `raw`. `norm` and `raw` are recommended.
`control`	list of control elements for the functional coefficients. See `fccaGen` for details.

`Mu`	Estimated intercept from each of the iterations
`Beta`	Estimated functional coefficients from each of the iterations
`alpha`	Distance along the directions from each of the iterations
`p2_norm`	Normalization constant applied to each of the iterations
`AllIndex`	All the index. If one variable is removed, it will become a negative index.
`index`	All the index at the end of the selection.
`CD`	Stopping rule designed for this algorithm. The algorithm should stop when this value is very small. Normally we can observe an obvious and severe drop of the value.
`resid`	Residual from each of the iteration.
`RowMeans`	Point-wise mean of the functional variables and mean of the scalar variables.
`RowSds`	Point-wise sd of the functional variables and sd of the scalar variables.
`yMean`	Mean of the response variable.
`ySD`	SD of the response variable.
`p0`	The projections obtained from each iteration without normalization.
`cor1`	The maximum correlation obtained from the first iteration.
`lasso`	Weather have lasso step or not.
`df`	The degrees of freedom calculated at the end of each iteration.
`Sigma2Bar`	Estimated $sigma^2$.
`StopStat`	Conventional stopping criteria.
`varSplit`	The variation explained by each of the candidate variables at each iteration.
`SignCheckF`	The proportion of sign changing for each of the candidate variables at each iteration.

library(flars)
library(fda)
#### Ex1 ####
## Generate some data.
dataL=data_generation(seed = 1,uncorr = TRUE,nVar = 8,nsamples = 120,
    var_type = 'm',cor_type = 3)

## Do the variable selection
out=flars(dataL$x,dataL$y,method='basis',max_selection=9,
    normalize='norm',lasso=FALSE)

## Check the stopping point with CD
plot(2:length(out$alpha),out$CD) # plot the CD with the iteration number

## In simple problems we can try
(iter=which.max(diff(out$CD))+2)


#### Ex2 ####
## Generate some data.
# dataL=data_generation(seed = 1,uncorr = FALSE,nVar = 8,nsamples = 120,
#      var_type = 'm',cor_type = 3)
## add more variables to the candidate
# for(i in 2:4){
# dataL0=data_generation(seed = i,uncorr = FALSE,nVar = 8,nsamples = 120,
#      var_type = 'm',cor_type = 3) 
# dataL$x=c(dataL$x,dataL0$x)
# }
# names(dataL$x)=paste0('v_',seq(length(dataL$x)))

## Do the variable selection
# out=flars(dataL$x,dataL$y,method='basis',max_selection=9,
#     normalize='norm',lasso=FALSE)

#### Ex3 (small subset of a real data set) ####
data(RealDa, package = 'flars')
out=flars(RealDa$x,RealDa$y,method='basis',max_selection=9,
    normalize='norm',lasso=FALSE)
# out=flars(RealDa$x,RealDa$y,method='basis',max_selection=9,
#     normalize='norm',lasso=TRUE)

## Check the stopping point with CD
plot(2:length(out$alpha),out$CD) # plot the CD with the iteration number
## The value drops to very small compare to others at iteration six and 
###  stays low after that, so the algorithm may stop there.