flars: Functional least angle regression. In flars: Functional LARS

Description

This is the main function for the functional least angle regression algorithm. Under certain conditions, the function only needs the input of two arguments: `x` and `y`. This function can do both variable selection and parameter estimation.

Usage

 ```1 2 3 4``` ```flars(x,y,method=c('basis','gq','raw'),max_selection,cv=c('gcv'), normalize=c('trace','rank','norm','raw'),lasso=TRUE,check=1, select=TRUE,VarThreshold=0.1,SignThreshold=0.8, control=list()) ```

Arguments

 `x` The mixed scalar and functional variables. Note that each of the functional variables is expected to be stored in a matrix. Each row of the matrix should represent a sample or a curve. If there is only one functional variable, `x` can be a matrix. If there is only scalar variables, `x` can be a vector or a matrix. If there are more than one functional variables, or there are mixed functional and scalar variables, `x` should be a list. If `x` is a list, each item of the list should correspond to one variable. `y` The scalar variable. It can be a matrix or a vector. `method` The representative methods for the functional coefficients. The method could be one of the 'basis', 'gq' and 'raw' for basis function expression, Gaussian quadrature and representative data points, respectively. `max_selection` Number of maximum selections when stopping the algorithm. Set a reasonable number for this argument to increase the calculation speed. `cv` Choise of cross validation. At the moment, the only choice is the generalized cross validation, i.e., `cv='gcv'`. `lasso` Use lasso modification or not. In other words, can variables selected in the former iterations be removed in the later iterations. `check` Type of check methods for lasso modification. 1 means variance check, 2 means sign check. `check=1` is much better than the other one. `select` If `TRUE`, the aim is to do selection rather than parameter estimation, and the stopping rule can be used when `lasso=TRUE`. If `FALSE`, the stopping rule may not work when `lasso=TRUE`. `VarThreshold` Threshold for removing variables based on variation explained. More specifically, one condition to remove a variable is that the variation explained by a variable is less than `VarThreshold*Var(y)`. To remove this variable, there is another condition: the variation explained by this variable is less than largest variation it explained in the previous iterations. `SignThreshold` This is a similar argument to `VarThreshold`. If a functional coefficient has less than `SignThreshold` same as that from the previous iteration, the variable is removed. `normalize` Choice of normalization methods. This is to remove any effects caused by the different dimensions of functional variables and scalar variables. Currently we have `trace`, `rank`, `norm`, `raw`. `norm` and `raw` are recommended. `control` list of control elements for the functional coefficients. See `fccaGen` for details.

Value

 `Mu` Estimated intercept from each of the iterations `Beta` Estimated functional coefficients from each of the iterations `alpha` Distance along the directions from each of the iterations `p2_norm` Normalization constant applied to each of the iterations `AllIndex` All the index. If one variable is removed, it will become a negative index. `index` All the index at the end of the selection. `CD` Stopping rule designed for this algorithm. The algorithm should stop when this value is very small. Normally we can observe an obvious and severe drop of the value. `resid` Residual from each of the iteration. `RowMeans` Point-wise mean of the functional variables and mean of the scalar variables. `RowSds` Point-wise sd of the functional variables and sd of the scalar variables. `yMean` Mean of the response variable. `ySD` SD of the response variable. `p0` The projections obtained from each iteration without normalization. `cor1` The maximum correlation obtained from the first iteration. `lasso` Weather have lasso step or not. `df` The degrees of freedom calculated at the end of each iteration. `Sigma2Bar` Estimated \$sigma^2\$. `StopStat` Conventional stopping criteria. `varSplit` The variation explained by each of the candidate variables at each iteration. `SignCheckF` The proportion of sign changing for each of the candidate variables at each iteration.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45``` ```library(flars) library(fda) #### Ex1 #### ## Generate some data. dataL=data_generation(seed = 1,uncorr = TRUE,nVar = 8,nsamples = 120, var_type = 'm',cor_type = 3) ## Do the variable selection out=flars(dataL\$x,dataL\$y,method='basis',max_selection=9, normalize='norm',lasso=FALSE) ## Check the stopping point with CD plot(2:length(out\$alpha),out\$CD) # plot the CD with the iteration number ## In simple problems we can try (iter=which.max(diff(out\$CD))+2) #### Ex2 #### ## Generate some data. # dataL=data_generation(seed = 1,uncorr = FALSE,nVar = 8,nsamples = 120, # var_type = 'm',cor_type = 3) ## add more variables to the candidate # for(i in 2:4){ # dataL0=data_generation(seed = i,uncorr = FALSE,nVar = 8,nsamples = 120, # var_type = 'm',cor_type = 3) # dataL\$x=c(dataL\$x,dataL0\$x) # } # names(dataL\$x)=paste0('v_',seq(length(dataL\$x))) ## Do the variable selection # out=flars(dataL\$x,dataL\$y,method='basis',max_selection=9, # normalize='norm',lasso=FALSE) #### Ex3 (small subset of a real data set) #### data(RealDa, package = 'flars') out=flars(RealDa\$x,RealDa\$y,method='basis',max_selection=9, normalize='norm',lasso=FALSE) # out=flars(RealDa\$x,RealDa\$y,method='basis',max_selection=9, # normalize='norm',lasso=TRUE) ## Check the stopping point with CD plot(2:length(out\$alpha),out\$CD) # plot the CD with the iteration number ## The value drops to very small compare to others at iteration six and ### stays low after that, so the algorithm may stop there. ```

flars documentation built on May 29, 2017, 9:10 p.m.