ff: Construct a function-on-function regression term In refund: Regression with Functional Data

Description

Defines a term \int^{s_{hi, i}}_{s_{lo, i}} X_i(s)β(t,s)ds for inclusion in an mgcv::gam-formula (or bam or gamm or gamm4:::gamm4) as constructed by pffr.
Defaults to a cubic tensor product B-spline with marginal first order differences penalties for β(t,s) and numerical integration over the entire range [s_{lo, i}, s_{hi, i}] = [\min(s_i), \max(s_i)] by using Simpson weights. Can't deal with any missing X(s), unequal lengths of X_i(s) not (yet?) possible. Unequal integration ranges for different X_i(s) should work. X_i(s) is assumed to be numeric (duh...).

Usage

  1 2 3 4 5 6 7 8 9 10 11 12 ff( X, yind = NULL, xind = seq(0, 1, l = ncol(X)), basistype = c("te", "t2", "ti", "s", "tes"), integration = c("simpson", "trapezoidal", "riemann"), L = NULL, limits = NULL, splinepars = if (basistype != "s") { list(bs = "ps", m = list(c(2, 1), c(2, 1)), k = c(5, 5)) } else { list(bs = "tp", m = NA) }, check.ident = TRUE ) 

Arguments

 X an n by ncol(xind) matrix of function evaluations X_i(s_{i1}),…, X_i(s_{iS}); i=1,…,n. yind DEPRECATED used to supply matrix (or vector) of indices of evaluations of Y_i(t), no longer used. xind vector of indices of evaluations of X_i(s), i.e, (s_{1},…,s_{S}) basistype defaults to "te", i.e. a tensor product spline to represent β(t,s). Alternatively, use "s" for bivariate basis functions (see mgcv's s) or "t2" for an alternative parameterization of tensor product splines (see mgcv's t2). integration method used for numerical integration. Defaults to "simpson"'s rule for calculating entries in L. Alternatively and for non-equidistant grids, "trapezoidal" or "riemann". "riemann" integration is always used if limits is specified L optional: an n by ncol(xind) matrix giving the weights for the numerical integration over s. limits defaults to NULL for integration across the entire range of X(s), otherwise specifies the integration limits s_{hi}(t), s_{lo}(t): either one of "s

Details

If check.ident==TRUE and basistype!="s" (the default), the routine checks conditions for non-identifiability of the effect. This occurs if a) the marginal basis for the functional covariate is rank-deficient (typically because the functional covariate has lower rank than the spline basis along its index) and simultaneously b) the kernel of Cov(X(s)) is not disjunct from the kernel of the marginal penalty over s. In practice, a) occurs quite frequently, and b) occurs usually because curve-wise mean centering has removed all constant components from the functional covariate.
If there is kernel overlap, β(t,s) is constrained to be orthogonal to functions in that overlap space (e.g., if the overlap contains constant functions, constraints "\int β(t,s) ds = 0 for all t" are enforced). See reference for details.
A warning is always given if the effective rank of Cov(X(s)) (defined as the number of eigenvalues accounting for at least 0.995 of the total variance in X_i(s)) is lower than 4. If X_i(s) is of very low rank, ffpc-term may be preferable.

Value

A list containing

 call a "call" to te (or s or t2) using the appropriately constructed covariate and weight matrices data a list containing the necessary covariate and weight matrices

Author(s)

Fabian Scheipl, Sonja Greven

References

For background on check.ident:
Scheipl, F., Greven, S. (2016). Identifiability in penalized function-on-function regression models. Electronic Journal of Statistics, 10(1), 495–526. https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-10/issue-1/Identifiability-in-penalized-function-on-function-regression-models/10.1214/16-EJS1123.full

mgcv's linear.functional.terms`