SpiceFP-package: A Sparse and Structured Procedure to Identify Combined...
In SpiceFP: Sparse Method to Identify Joint Effects of Functional Predictors

SpiceFP-package

R Documentation

A Sparse and Structured Procedure to Identify Combined Effects of Functional Predictors

Description

A set of functions allowing to implement the 'SpiceFP' approach which is iterative. It involves transformation of functional predictors into several candidate explanatory matrices (based on contingency tables), to which relative edge matrices with contiguity constraints are associated. Generalized Fused Lasso regression are performed in order to identify the best candidate matrix, the best class intervals and related coefficients at each iteration. The approach is stopped when the maximal number of iterations is reached or when retained coefficients are zeros. Supplementary functions allow to get coefficients of any candidate matrix or mean of coefficients of many candidates. The methods in this package are describing in Girault Gnanguenon Guesse, Patrice Loisel, Bénedicte Fontez, Thierry Simonneau, Nadine Hilgert (2021) "An exploratory penalized regression to identify combined effects of functional variables -Application to agri-environmental issues" https://hal.archives-ouvertes.fr/hal-03298977.

Details

The main function of the package is the spicefp function. It directly performs the three main steps of the SpiceFP approach, by using intermediate functions of the package.
1) At he first step, contingency tables are constructed by defining joint modalities using class intervals or bins. Several candidate partitions are then defined. For each statistical individual i and each candidate partition (denoted u here), the 2 (resp. 3) functional predictors are transformed into frequency bi(resp. tri)-variate histograms (or contingency tables), stored as row vectors. The combination of these row vectors for all individuals enables the construction of a candidate explanatory matrix indexed by u (denoted here X^u). The function candidates is designed to build these candidate matrices.
2) At the second step, for each candidate explanatory matrix, an edge matrix is defined to represent the contiguity constraints between modalities of the contingency table.
3) Finally at the last step, the best class intervals and related regression coefficients are defined by: i) performing a Generalized Fused Lasso using each candidate explanatory matrix. The SpiceFP model is the following

y_i = X_i^u \beta^u + \varepsilon_i,

where \beta^u is the coefficient to be estimated on the 2D (resp. 3D) intervals. The estimator of \beta is obtained as follows:

\hat{\beta}^{u,\gamma}(\lambda) = argmin \frac{1}{2} \|y - X^u \beta\|_2^2 + \lambda \|D ^{u,\gamma} \beta\|_1,

where \lambda is a penalty parameter that controls the smoothness of the coefficients, and \gamma is the ratio between the regularization parameters of parsimony and fusion. ii) choosing the best candidate matrix and selecting its variables using an information criterion and checking the shutdown conditions to stop the approach. Indeed, SpiceFP may be used in an iterative way. It therefore allows to identify up to K best candidate matrices and related coefficients.