cv.fof.spike: Cross-validation for linear function-on-function regression...

Description Usage Arguments Details Value Author(s) References Examples

Description

This function is used to perform cross-validation and build the final model for highly densely observed spiky data using the signal compression approach for the following linear function-on-function regression model:

Y(t)= μ(t)+\int X_1(s)β_1(s,t)ds+...+\int X_p(s)β_p(s,t)ds+ε(t),

where μ(t) is the intercept function. The {X_i(s),1≤ i≤ p} are p functional predictors and {β_i(s,t),1≤ i≤ p} are their corresponding coefficient functions, where p is a positive integer. The ε(t) is the noise function.

We require that all the sample curves of each functional predictor are observed in a common dense grid of time points, but the grid can be different for different predictors. All the sample curves of the functional response are observed in a common dense grid.

Usage

1
cv.fof.spike(X, Y, t.x, t.y, K.cv = 5, upper.comp = 10, thresh = 0.0001)

Arguments

X

a list of length p, the number of functional predictors. Its i-th element is the n*m_i data matrix for the i-th functional predictor X_i(s), where n is the sample size and m_i is the number of observation time points for X_i(s).

Y

the n*m data matrix for the functional response Y(t), where n is the sample size and m is the number of the observation time points for Y(t).

t.x

a list of length p. Its i-th element is the vector of observation time points of the i-th functional predictor X_i(s), 1≤ i≤ p.

t.y

the vector of obesrvation time points of the functional response Y(t).

K.cv

the number of CV folds. Default is 5.

upper.comp

the upper bound for the maximum number of components to be calculated. Default is 10.

thresh

a number between 0 and 1 used to determine the maximum number of components we need to calculate. The maximum number is between one and the "upp.comp" above. The optimal number of components will be chosen between 1 and this maximum number, together with other tuning parameters by cross-validation. A smaller thresh value leads to a larger maximum number of components and a longer running time. A larger thresh value needs less running time, but may miss some important components and lead to a larger prediction error. Default is 0.0001.

Details

We use the decomposition of the coefficient function:

β_i(s,t)=ψ_{i1}(s)w_1(t)+...+ψ_{iK}(s)w_K(t), 1≤ i≤ p,

and estimate {ψ_{ik}(s), 1≤ i≤ p} for each k>0 by solving a generalized functional eigenvalue problem as given in cv.sigcom but with penalty replaced by

λ∑_{i=1}^p ∑_{j=0}^{J_1}\{e^{τ(j-J_1)}2^{2α j}||b_{ik,j}||^2+ e^{-τ J_1} ||a_{ik,0}||^2\},

where a_{ik,0} and b_{ik,j} (0≤ j ≤ J_1) are vectors of wavelet coefficients for ψ_{ik}. Then we estimate {w_{k}(t), k>0} by regressing Y(t) on \{\hat{z}_{1},... \hat{z}_{K}\} using penalized least square method, where \hat{z}_{k}= ∑_{i=1}^p \int \{X_i(s)-\bar{X}_i(s)\}\hat{ψ}_{ik}(s)ds.

Value

An object of the “cv.fof.spike” class, which is used in the function pred.fof.spike for prediction.

mu

a vector containing the estimated values of the intercept function at t.y.

Beta

a list of p matrices, where the i-th matrix contains the estimated values of the slope coefficient function beta_i(s,t) at grid t.x * t.y.

c_fit_cv

a list for internal use.

Author(s)

Xin Qi and Ruiyan Luo

References

Xin Qi and Ruiyan Luo, (manuscript) Functional regression for highly densely observed functional data with novel regularity.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
###########################################################################
# Example: spiky function-on-function regresion
###########################################################################
ptm <- proc.time()
library(FRegSigCom)
library(refund)
data(DTI)
I=which(is.na(apply(DTI$cca,1,mean)))
X=DTI$cca[-I,] # functional response
Y=DTI$rcst[-I,-(1:12)] #functional predictor
t.x <- list(seq(0,1,length=dim(X)[2]))
t.y <- seq(0,1,length=dim(Y)[2])
# randomly split all the observations into a training set with 200 observations
# and a test set.
train.id=sample(1:nrow(Y), 30)
X.train <- list(X[train.id,])
Y.train <- Y[train.id, ]
X.test <- list(X[-(train.id),])
Y.test <- Y[-(train.id), ]
fit.cv=cv.fof.spike(X.train, Y.train, t.x, t.y)
Y.pred=pred.fof.spike(fit.cv, X.test)
error<- mean((Y.pred-Y.test)^2)
print(c("prediction error=", error))

print(proc.time()-ptm)

FRegSigCom documentation built on May 1, 2019, 9:45 p.m.