cv.hd: Cross-validation for sparse linear function-on-function...
In FRegSigCom: Functional Regression using Signal Compression Approach

Description Usage Arguments Details Value Author(s) References Examples

Conduct cross-validation and build the final model for the following function-on-function regression model:

Y(t)= μ(t)+\int X_1(s)β_1(s,t)ds+...+\int X_p(s)β_p(s,t)ds+\varepsilon(t),

where μ(t) is the intercept function. The {X_i(s), 1≤ i≤ p} are p functional predictors and {β_i(s,t),1≤ i≤ p} are corresponding coefficient functions. The ε(t) is the noise function. The p can be much larger than the sample size.

We require that all the sample curves of each functional predictor are observed in a common dense set of time points, but the set can be different for different functional predictor. All the sample curves of the response are observed in a common dense set.

1 2	cv.hd(X, Y, t.x.list, t.y, K.cv = 5, s.n.basis = 25, t.n.basis = 50, thresh = 0.01)

`X`	a list of length p. Its i-th element is the nm_i* data matrix for the i-th funtional predictor X_i(s), where n is the sample size and m_i is the number of the observation points for X_i(s).
`Y`	the nm* data matrix for the functional response Y(t), where n is the sample size and m is the number of the observation points for Y(t).
`t.x.list`	a list of length p. Its i-th element is the vector of observation points of the i-th functional predictor X_i(s), 1≤ i≤ p.
`t.y`	the vector of observation points of the functional response Y(t).
`K.cv`	the number of CV folds. Default is 5.
`s.n.basis`	the number of B-spline basis functions used for estimating the functions ψ_{ik}(s)'s. Default is 25.
`t.n.basis`	the number of B-spline basis functions used for estimating the functions w_{k}(t)'s. Default is 50.
`thresh`	a number between 0 and 1 specifying the minimum proportion of variation to be explained by each selected component relative to all the selected components. This will determine the upper bound of the number of components for CV, and the optimal number of components will be determined from 1,2,..., to this uppber bound by CV. A smaller thresh value leads to more components to be selected and a longer running time. A larger thresh value needs less running time, but may miss some important components and lead to a larger prediction error. Default is 0.01.

This method estimates a special decomposition of the coefficient functions induced by the KL expansion of the signal function:

β_i(s,t)=ψ_{i1}(s)w_1(t)+ψ_{i2}(s)w_2(t)+..., 1≤ i≤ p.

We first estimate \bold{ψ}_k=(ψ_{1k}(s),...,ψ_{pk}(s)) for each 0<k<K, by solving the generalied functional eigenvalue problem

max_{\bold{ψ}} \frac{\int\int \bold{ψ}(s)^T\hat{B}(s,s')\bold{ψ}(s')dsds'}{\int\int \bold{ψ}(s)^T\hat{Σ}(s,s')\bold{ψ}(s')dsds'+P(\bold{ψ})}

{\rm{s.t.}} \quad \int\int \bold{ψ}(s)^T \hat{Σ}(s,s')\bold{ψ}(s')dsds'=1

{\rm{and}} \quad \int\int \bold{ψ}(s)^T \hat{Σ}(s,s')\bold{ψ}_{k'}(s')dsds'=0 \quad {\rm{for}} \quad k'<k

with the simultaneous sparse and smooth penalty

P(ψ_1(s),...,ψ_p(s))=τ {(1-λ) (||ψ_{1}||_η^2+...+||ψ_{p}||_η^2)+λ(||ψ_{1}||_η+||ψ_{p}||_η)^2},

where ||ψ_{i}||_η^2=||ψ_i||^2+η||ψ_i''||^2. Here \hat{B} and \hat{Σ} are p*p matrices of functions with the (i,j) element respectively as \hat{B}_{ij}(s,s')=∑_{\ell=1}^n∑_{\ell'=1}^n \{x_{\ell,i}(s)-\bar{x}_i(s)\}\int\{y_{\ell}(t)-\bar{y}(t)\}\{y_{\ell'}(t)-\bar{y}(t)\}dt \{x_{\ell',j}(s')-\bar{x}_j(s')\}/n^2 and \hat{Σ}_{ij}(s,s')=∑_{\ell=1}^n \{x_{\ell,i}(s)-\bar{x}_i(s)\}\{x_{\ell,j}(s')-\bar{x}_j(s')\}/n, where (x_{\ell,1},...,x_{\ell,p},y_{\ell}) represents the \ell-th sample. Then we estimate {w_{k}(t), 0<k<K} by regressing \{y_{\ell}(t)\} on \{\hat{z}_{\ell,1},... \hat{z}_{\ell,K}\} with penalty κ ∑_{k=0}^K \|w''_k\|^2 tuned by the smoothness parameter κ. Here \hat{z}_{\ell,k}= ∑_{i=1}^p \int (x_{\ell,i}(s)-\bar{x}_i(s))\hat{ψ}_{ik}(s)ds. The number of selected componets K will be chosen by cross-validation.

An object of the “cv.hd” class, which is used in the function pred.hd for prediction and getcoef.hd for extracting the estimated coefficient functions.

`errors`	list for CV errors.
`min.error`	minimum CV error.
`opt.index`	index of the optimal tunning parameters.
`params.set`	the set of candidate values for tuning parameters used in CV. It's a matrix with 4 columns corresponding to the tuning parameters τ, λ, η, κ, respectively.
`opt.K`	optimal number of components to be selected.
`opt.tau`	optimal value for τ tuning the simultaenous sparse-smooth penalty on {ψ_{ik}(s), 1 ≤ i ≤ p}.
`opt.lambda`	optimal value for λ, the sparsity parameter for {ψ_{ik}(s), 1 ≤ i ≤ p}.
`opt.eta`	optimal value for η, the smoothness parameter for ψ_{ik}(s).
`opt.kappa`	optimal value for κ, the smoothness parameter for w_k(t).
`maxK.ret`	a list for internal use.
`t.x.list`	the input `t.x.list`.
`t.y`	the input `t.y`.
`y.params`	a list for internal use.

Xin Qi and Ruiyan Luo

Xin Qi and Ruiyan Luo (2018) Function-on-Function Regression with thousands of predictive curves, Jounal of Multivariate Analysis 163:51-66. https://doi.org/10.1016/j.jmva.2017.10.002

#########################################
#toy example using the air quality data with p=1
#########################################
data(air)
t.x=seq(0,1,length=24)
t.y=seq(0,1,length=24)
air.cv=cv.hd(X=list(air[[2]][1:20,]), Y=air[[1]][1:20,], list(t.x), t.y,
             K.cv=2, s.n.basis = 8, t.n.basis = 8)
air.pred=pred.hd(air.cv, list(air[[2]][1:2,]))
predict.error=mean((air.pred-air[[1]][1:2,])^2)
print(c("predict error=", predict.error))



#########################################
#example with simulated data and p=200
#########################################

rm(list=ls())
ptm <- proc.time()
library(MASS)

n.curves=200 # total number of predictor curves.

t.x=seq(0,1,length.out=80) # observation points for X_i(s).
t.y=seq(0,1,length.out=100) # observation points for Y(t).
ntrain <- 100 # number of observations in training data
nnew <-50 # number of new observations
ntot <- ntrain+nnew
########################################
##generate the predictor curves using cos() basis functions.
############################################
W <- list()
for(j in 1:(n.curves+5))
{
  W[[j]] <- 0
  for(k in 1:30)
  W[[j]] <- W[[j]]+rnorm(ntot) %*% t(cos(k*pi*t.x))/k
}
X=lapply(1:n.curves,
function(k){(W[[k]]+W[[k+1]]+W[[k+2]]+W[[k+3]]+W[[k+4]]
             +W[[k+5]])/k})
########################################
##generate the coefficient functions. Only the first five are nonzero.
############################################
mu=sin(2*pi*t.y)
B.coef=list()
B.coef[[1]]=sapply(1:length(t.y), function(k){4*t.x^2*t.y[k]})
B.coef[[2]]=sapply(1:length(t.y), function(k){4*sin(pi*t.x)*cos(pi*t.y[k])})
B.coef[[3]]=sapply(1:length(t.y), function(k){4*exp(-((t.x-0.5)^2+(t.y[k]-0.5)^2)/2)})
B.coef[[4]]=sapply(1:length(t.y), function(k){4*t.x/(1+t.y[k]^2)})
B.coef[[5]]=sapply(1:length(t.y), function(k){4*log(1+t.x)*sqrt(t.y[k])})
########################################
##generate the response curves
############################################
Y=0
for(j in 1:5){
Y<- Y+ (X[[j]] %*% B.coef[[j]])/length(t.x)
}
E=sqrt(0.01)*matrix(rnorm(ntot*length(t.y)), ntot, length(t.y))
Y=rep(1,ntot)%*%t(mu)+Y+E
X.train=lapply(1:n.curves, function(k){X[[k]][(1:ntrain),]})
X.new=lapply(1:n.curves, function(k){X[[k]][-(1:ntrain),]})
Y.train <- Y[1:ntrain,]
Y.new <- Y[-(1:ntrain),]
E.new=E[-(1:ntrain),]

########################################
##fit the model using the sparse function-on-function method
############################################
t.x.list=lapply(1:n.curves, function(k){t.x})
fit.cv <- cv.hd(X.train, Y.train, t.x.list, t.y,   K.cv=5, s.n.basis=25, t.n.basis=40)
Y.pred=pred.hd(fit.cv, X.new)
predict.error=mean((Y.pred-Y.new)^2)
est.error=mean((Y.pred-Y.new+E.new)^2)
print(c("predict error=", predict.error))
print(c("estimation error=", est.error))
est.coefficient=getcoef.hd(fit.cv)
mu.est=est.coefficient[[1]]
beta.est=est.coefficient[[2]]