MFPCA: Multivariate functional principal component analysis for...

View source: R/MFPCA_calculation.R

MFPCAR Documentation

Multivariate functional principal component analysis for functions on different (dimensional) domains

Description

This function calculates a multivariate functional principal component analysis (MFPCA) based on i.i.d. observations x_1, …, x_N of a multivariate functional data-generating process X = X^(1), …, X^(p) with elements X^(j) in L^2(calT_j) defined on a domain calT_j of IR^{d_j}. In particular, the elements can be defined on different (dimensional) domains. The results contain the mean function, the estimated multivariate functional principal components \hat ψ_1, …, \hat ψ_M (having the same structure as x_i), the associated eigenvalues \hat ν_1 ≥q … ≥q \hat ν_M > 0 and the individual scores \hat ρ_{im} = \hat{<x_i, ψ_m>}. Moreover, estimated trajectories for each observation based on the truncated Karhunen-Loeve representation

\hat x_i = ∑_{m = 1}^M \hat ρ_{im} \hat ψ_m

are given if desired (fit = TRUE). The implementation of the observations x_i = (x_i^(1), … , x_i^(p)), i = 1 , …, N, the mean function and multivariate functional principal components \hat ψ_1, …, \hat ψ_M uses the multiFunData class, which is defined in the package funData.

Usage

MFPCA(
  mFData,
  M,
  uniExpansions,
  weights = rep(1, length(mFData)),
  fit = FALSE,
  approx.eigen = FALSE,
  bootstrap = FALSE,
  nBootstrap = NULL,
  bootstrapAlpha = 0.05,
  bootstrapStrat = NULL,
  verbose = options()$verbose
)

Arguments

mFData

A multiFunData object containing the N observations.

M

The number of multivariate functional principal components to calculate.

uniExpansions

A list characterizing the (univariate) expansion that is calculated for each element. See Details.

weights

An optional vector of weights, defaults to 1 for each element. See Details.

fit

Logical. If TRUE, a truncated multivariate Karhunen-Loeve representation for the data is calculated based on the estimated scores and eigenfunctions.

approx.eigen

Logical. If TRUE, the eigenanalysis problem for the estimated covariance matrix is solved approximately using the irlba package, which is much faster. If the number M of eigenvalues to calculate is high with respect to the number of observations in mFData or the number of estimated univariate eigenfunctions, the approximation may be inappropriate. In this case, approx.eigen is set to FALSE and the function throws a warning. Defaults to FALSE.

bootstrap

Logical. If TRUE, pointwise bootstrap confidence bands are calculated for the multivariate functional principal components. Defaults to FALSE. See Details.

nBootstrap

The number of bootstrap iterations to use. Defaults to NULL, which leads to an error, if bootstrap = TRUE.

bootstrapAlpha

A vector of numerics (or a single number) giving the significance level for bootstrap intervals. Defaults to 0.05.

bootstrapStrat

A stratification variable for bootstrap. Must be a factor of length nObs(mFData) or NULL (default). If NULL, no stratification is made in the bootstrap resampling, i.e. the curves are sampled with replacement. If bootstrapStrat is not NULL, the curves are resampled with replacement within the groups defined by bootstrapStrat, hence keeping the group proportions fixed.

verbose

Logical. If TRUE, the function reports extra-information about the progress (incl. timestamps). Defaults to options()$verbose.

Details

Weighted MFPCA

If the elements vary considerably in domain, range or variation, a weight vector w_1 , …, w_p can be supplied and the MFPCA is based on the weighted scalar product

<<f,g>>_w = ∑_{j = 1}^p w_j \int_\calT_j f^(j)(t) g^(j)(t) d t

and the corresponding weighted covariance operator Γ_w.

Bootstrap

If bootstrap = TRUE, pointwise bootstrap confidence bands are generated for the multivariate eigenvalues \hat ν_1, …, \hat ν_M as well as for multivariate functional principal components \hat ψ_1, …, \hat ψ_M. The parameter nBootstrap gives the number of bootstrap iterations. In each iteration, the observations are resampled on the level of (multivariate) functions and the whole MFPCA is recalculated. In particular, if the univariate basis depends on the data (FPCA approaches), basis functions and scores are both re-estimated. If the basis functions are fixed (e.g. splines), the scores from the original estimate are used to speed up the calculations. The confidence bands for the eigenfunctions are calculated separately for each element as pointwise percentile bootstrap confidence intervals. Analogously, the confidence bands for the eigenvalues are also percentile bootstrap confidence bands. The significance level(s) can be defined by the bootstrapAlpha parameter, which defaults to 5%. As a result, the MFPCA function returns a list CI of the same length as bootstrapAlpha, containing the lower and upper bounds of the confidence bands for the principal components as multiFunData objects of the same structure as mFData. The confidence bands for the eigenvalues are returned in a list CIvalues, containing the upper and lower bounds for each significance level.

Univariate Expansions

The multivariate functional principal component analysis relies on a univariate basis expansion for each element X^(j). The univariate basis representation is calculated using the univDecomp function, that passes the univariate functional observations and optional parameters to the specific function. The univariate decompositions are specified via the uniExpansions argument in the MFPCA function. It is a list of the same length as the mFData object, i.e. having one entry for each element of the multivariate functional data. For each element, uniExpansion must specify at least the type of basis functions to use. Additionally, one may add further parameters. The following basis representations are supported:

  • Given basis functions. Then uniExpansions[[j]] = list(type = "given", functions, scores, ortho), where functions is a funData object on the same domain as mFData, containing the given basis functions. The parameters scores and ortho are optional. scores is an N x K matrix containing the scores (or coefficients) of the observed functions for the given basis functions, where N is the number of observed functions and K is the number of basis functions. Note that the scores need to be demeaned to give meaningful results. If scores are not supplied, they are calculated using the given basis functions. The parameter ortho specifies whether the given basis functions are orthonormal orhto = TRUE or not ortho = FALSE. If ortho is not supplied, the functions are treated as non-orthogonal. scores and ortho are not checked for plausibility, use them at your own risk!

  • Univariate functional principal component analysis. Then uniExpansions[[j]] = list(type = "uFPCA", nbasis, pve, npc, makePD), where nbasis,pve,npc,makePD are parameters passed to the PACE function for calculating the univariate functional principal component analysis.

  • Basis functions expansions from the package fda. Then uniExpansions[[j]] = list(type = "fda", ...), where ... are passed to funData2fd, which heavily builds on eval.fd. If fda is not available, a warning is thrown.

  • Spline basis functions (not penalized). Then uniExpansions[[j]] = list(type = "splines1D", bs, m, k), where bs,m,k are passed to the functions univDecomp and univExpansion. For two-dimensional tensor product splines, use type = "splines2D".

  • Spline basis functions (with smoothness penalty). Then uniExpansions[[j]] = list(type = "splines1Dpen", bs, m, k), where bs,m,k are passed to the functions univDecomp and univExpansion. Analogously to the unpenalized case, use type = "splines2Dpen" for 2D penalized tensor product splines.

  • Cosine basis functions. Use uniExpansions[[j]] = list(type = "DCT2D", qThresh, parallel) for functions one two-dimensional domains (images) and type = "DCT3D" for 3D images. The calculation is based on the discrete cosine transform (DCT) implemented in the C-library fftw3. If this library is not available, the function will throw a warning. qThresh gives the quantile for hard thresholding the basis coefficients based on their absolute value. If parallel = TRUE, the coefficients for different images are calculated in parallel.

See univDecomp and univExpansion for details.

Value

An object of class MFPCAfit containing the following components:

values

A vector of estimated eigenvalues \hat ν_1 , … , \hat ν_M.

functions

A multiFunData object containing the estimated multivariate functional principal components \hat ψ_1, …, \hat ψ_M.

scores

A matrix of dimension N x M containing the estimated scores \hat ρ_{im}.

vectors

A matrix representing the eigenvectors associated with the combined univariate score vectors. This might be helpful for calculating predictions.

normFactors

The normalizing factors used for calculating the multivariate eigenfunctions and scores. This might be helpful when calculation predictions.

meanFunction

A multivariate functional data object, corresponding to the mean function. The MFPCA is applied to the de-meaned functions in mFData.

fit

A multiFunData object containing estimated trajectories for each observation based on the truncated Karhunen-Loeve representation and the estimated scores and eigenfunctions.

CI

A list of the same length as bootstrapAlpha, containing the pointwise lower and upper bootstrap confidence bands for each eigenfunction and each significance level in form of multiFunData objects (only if bootstrap = TRUE).

CIvalues

A list of the same length as bootstrapAlpha, containing the lower and upper bootstrap confidence bands for each eigenvalue and each significance level (only if bootstrap = TRUE).

References

C. Happ, S. Greven (2018): Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains. Journal of the American Statistical Association, 113(522): 649-659. DOI: doi: 10.1080/01621459.2016.1273115

C. Happ-Kurz (2020): Object-Oriented Software for Functional Data. Journal of Statistical Software, 93(5): 1-38. DOI: doi: 10.18637/jss.v093.i05

See Also

See Happ-Kurz (2020. doi: 10.18637/jss.v093.i05) for a general introduction to the funData package and it's interplay with MFPCA. This file also includes a case study on how to use MFPCA. Useful functions: multiFunData, PACE, univDecomp, univExpansion, summary, plot, scoreplot

Examples

oldPar <- par(no.readonly = TRUE)

set.seed(1)

### simulate data (one-dimensional domains)
sim <-  simMultiFunData(type = "split", argvals = list(seq(0,1,0.01), seq(-0.5,0.5,0.02)),
                        M = 5, eFunType = "Poly", eValType = "linear", N = 100)

# MFPCA based on univariate FPCA
uFPCA <- MFPCA(sim$simData, M = 5, uniExpansions = list(list(type = "uFPCA"),
                                                                  list(type = "uFPCA")))
summary(uFPCA)
plot(uFPCA) # plot the eigenfunctions as perturbations of the mean
scoreplot(uFPCA) # plot the scores

# MFPCA based on univariate spline expansions
splines <- MFPCA(sim$simData, M = 5, uniExpansions = list(list(type = "splines1D", k = 10),
                                                          list(type = "splines1D", k = 10)),
                 fit = TRUE) # calculate reconstruction, too
summary(splines)
plot(splines) # plot the eigenfunctions as perturbations of the mean
scoreplot(splines) # plot the scores

### Compare estimates to true eigenfunctions
# flip to make results more clear
uFPCA$functions <- flipFuns(sim$trueFuns, uFPCA$functions)
splines$functions <- flipFuns(sim$trueFuns, splines$functions)

par(mfrow = c(1,2))
plot(sim$trueFuns[[1]], main = "Eigenfunctions\n1st Element", lwd = 2)
plot(uFPCA$functions[[1]], lty = 2, add = TRUE)
plot(splines$functions[[1]], lty = 3, add = TRUE)

plot(sim$trueFuns[[2]], main = "Eigenfunctions\n2nd Element", lwd = 2)
plot(uFPCA$functions[[2]], lty = 2, add = TRUE)
plot(splines$functions[[2]], lty = 3, add = TRUE)
legend("bottomleft", c("True", "uFPCA", "splines"), lty = 1:3, lwd = c(2,1,1))

# Test reconstruction for the first 10 observations
plot(sim$simData[[1]], obs = 1:10, main = "Reconstruction\n1st Element", lwd = 2)
plot(splines$fit[[1]], obs = 1:10, lty = 2, col = 1, add = TRUE)

plot(sim$simData[[2]], obs = 1:10, main = "Reconstruction\n2nd Element", lwd = 2)
plot(splines$fit[[2]], obs = 1:10, lty = 2, col = 1, add = TRUE)
legend("bottomleft", c("True", "Reconstruction"), lty = c(1,2), lwd = c(2,1))

# MFPCA with Bootstrap-CI for the first 2 eigenfunctions
### ATTENTION: Takes long

splinesBoot <- MFPCA(sim$simData, M = 2, uniExpansions = list(list(type = "splines1D", k = 10),
                                                          list(type = "splines1D", k = 10)),
                 bootstrap = TRUE, nBootstrap = 100, bootstrapAlpha = c(0.05, 0.1), verbose = TRUE)
summary(splinesBoot)
                                 
plot(splinesBoot$functions[[1]], ylim = c(-2,1.5))
plot(splinesBoot$CI$alpha_0.05$lower[[1]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.05$upper[[1]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$lower[[1]], lty = 3, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$upper[[1]], lty = 3, add = TRUE)
abline(h = 0, col = "gray")
 
plot(splinesBoot$functions[[2]], ylim = c(-1,2.5))
plot(splinesBoot$CI$alpha_0.05$lower[[2]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.05$upper[[2]], lty = 2, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$lower[[2]], lty = 3, add = TRUE)
plot(splinesBoot$CI$alpha_0.1$upper[[2]], lty = 3, add = TRUE)
abline(h = 0, col = "gray")
legend("topleft", c("Estimate", "95% CI", "90% CI"), lty = 1:3, lwd = c(2,1,1))

# Plot 95% confidence bands for eigenvalues
plot(1:2, splinesBoot$values, pch = 20, ylim = c(0, 1.5), 
     main = "Estimated eigenvalues with 95% CI",
     xlab = "Eigenvalue no.", ylab = "")
arrows(1:2, splinesBoot$CIvalues$alpha_0.05$lower,
       1:2, splinesBoot$CIvalues$alpha_0.05$upper,
       length = 0.05, angle = 90, code = 3)
points(1:2, sim$trueVals[1:2], pch = 20, col = 4)
legend("topright", c("Estimate", "True value"), pch = 20, col = c(1,4))


### simulate data (two- and one-dimensional domains)
### ATTENTION: Takes long

set.seed(2)
sim <-  simMultiFunData(type = "weighted",
                 argvals = list(list(seq(0,1,0.01), seq(-1,1,0.02)), list(seq(-0.5,0.5,0.01))),
                 M = list(c(4,5), 20), eFunType = list(c("Fourier", "Fourier"), "Poly"),
                 eValType = "exponential", N = 150)

# MFPCA based on univariate spline expansions (for images) and univariate FPCA (for functions)
pca <- MFPCA(sim$simData, M = 10,
             uniExpansions = list(list(type = "splines2D", k = c(10,12)),
                             list(type = "uFPCA")))
summary(pca)
plot(pca) # plot the eigenfunctions as perturbations of the mean
scoreplot(pca) # plot the scores

### Compare to true eigenfunctions
# flip to make results more clear
pca$functions <- flipFuns(sim$trueFuns[1:10], pca$functions)

par(mfrow = c(5,2), mar = rep(2,4))
for(m in 2:6) # for m = 1, image.plot (used in plot(funData)) produces an error...
{
  plot(sim$trueFuns[[1]], main = paste("True, m = ", m), obs = m)
  plot(pca$functions[[1]], main = paste("Estimate, m = ", m), obs = m)
}

par(mfrow = c(1,1))
plot(sim$trueFuns[[2]], main = "Eigenfunctions (2nd element)", lwd = 2, obs=  1:5)
plot(pca$functions[[2]], lty = 2, add = TRUE, obs=  1:5)
legend("bottomleft", c("True", "MFPCA"), lty = 1:2, lwd = c(2,1))

par(oldPar)

MFPCA documentation built on Sept. 15, 2022, 9:07 a.m.