FPCA: Functional Principal Component Analysis

View source: R/FPCA.R

FPCAR Documentation

Functional Principal Component Analysis

Description

FPCA for dense or sparse functional data.

Usage

FPCA(Ly, Lt, optns = list())

Arguments

Ly

A list of n vectors containing the observed values for each individual. Missing values specified by NAs are supported for dense case (dataType='Dense').

Lt

A list of n vectors containing the observation time points for each individual corresponding to y. Each vector should be sorted in ascending order.

optns

A list of options control parameters specified by list(name=value). See ‘Details’.

Details

If the input is sparse data, make sure you check the design plot is dense and the 2D domain is well covered by support points, using plot or CreateDesignPlot. Some study design such as snippet data (where each subject is observed only on a sub-interval of the period of study) will have an ill-covered design plot, in which case the nonparametric covariance estimate will be unreliable. WARNING! Slow computation times may occur if the dataType argument is incorrect. If FPCA is taking a while, please double check that a dense design is not mistakenly coded as 'Sparse'. Applying FPCA to a mixture of very dense and sparse curves may result in computational issues.

Available control options are

userBwCov

The bandwidth value for the smoothed covariance function; positive numeric - default: determine automatically based on 'methodBwCov'

methodBwCov

The bandwidth choice method for the smoothed covariance function; 'GMeanAndGCV' (the geometric mean of the GCV bandwidth and the minimum bandwidth),'CV','GCV' - default: 10% of the support

userBwMu

The bandwidth value for the smoothed mean function (using 'CV' or 'GCV'); positive numeric - default: determine automatically based on 'methodBwMu'

methodBwMu

The bandwidth choice method for the mean function; 'GMeanAndGCV' (the geometric mean of the GCV bandwidth and the minimum bandwidth),'CV','GCV' - default: 5% of the support

dataType

The type of design we have (usually distinguishing between sparse or dense functional data); 'Sparse', 'Dense', 'DenseWithMV', 'p>>n' - default: determine automatically based on 'IsRegular'

diagnosticsPlot

Deprecated. Same as the option 'plot'

plot

Plot FPCA results (design plot, mean, scree plot and first K (<=3) eigenfunctions); logical - default: FALSE

error

Assume measurement error in the dataset; logical - default: TRUE

fitEigenValues

Whether also to obtain a regression fit of the eigenvalues - default: FALSE

FVEthreshold

Fraction-of-Variance-Explained threshold used during the SVD of the fitted covariance function; numeric (0,1] - default: 0.99

FVEfittedCov

Fraction-of-Variance explained by the components that are used to construct fittedCov; numeric (0,1] - default: NULL (all components available will be used)

kernel

Smoothing kernel choice, common for mu and covariance; "rect", "gauss", "epan", "gausvar", "quar" - default: "gauss"; dense data are assumed noise-less so no smoothing is performed.

kFoldMuCov

The number of folds to be used for mean and covariance smoothing. Default: 10

lean

If TRUE the 'inputData' field in the output list is empty. Default: FALSE

maxK

The maximum number of principal components to consider - default: min(20, N-2,nRegGrid-2), N:# of curves, nRegGrid:# of support points in each direction of covariance surface

methodXi

The method to estimate the PC scores; 'CE' (Conditional Expectation), 'IN' (Numerical Integration) - default: 'CE' for sparse data and dense data with missing values, 'IN' for dense data. If time points are irregular but spacing is small enough, 'IN' method is utilized by default.

methodMuCovEst

The method to estimate the mean and covariance in the case of dense functional data; 'cross-sectional', 'smooth' - default: 'cross-sectional'

nRegGrid

The number of support points in each direction of covariance surface; numeric - default: 51

numBins

The number of bins to bin the data into; positive integer > 10, default: NULL

methodSelectK

The method of choosing the number of principal components K; 'FVE','AIC','BIC', or a positive integer as specified number of components: default 'FVE')

shrink

Whether to use shrinkage method to estimate the scores in the dense case (see Yao et al 2003) - default FALSE

outPercent

A 2-element vector in [0,1] indicating the percentages of the time range to be considered as left and right boundary regions of the time window of observation - default (0,1) which corresponds to no boundary

methodRho

The method of regularization (add to diagonal of covariance surface) in estimating principal component scores; 'trunc': rho is truncation of sigma2, 'ridge': rho is a ridge parameter, 'vanilla': vanilla approach - default "vanilla".

rotationCut

The 2-element vector in [0,1] indicating the percent of data truncated during sigma^2 estimation; default (0.25, 0.75))

useBinnedData

Should the data be binned? 'FORCE' (Enforce the # of bins), 'AUTO' (Select the # of bins automatically), 'OFF' (Do not bin) - default: 'AUTO'

useBinnedCov

Whether to use the binned raw covariance for smoothing; logical - default:TRUE

usergrid

Whether to use observation grid for fitting, if false will use equidistant grid. logical - default:FALSE

userCov

The user-defined smoothed covariance function; list of two elements: numerical vector 't' and matrix 'cov', 't' must cover the support defined by 'Ly' - default: NULL

userMu

The user-defined smoothed mean function; list of two numerical vector 't' and 'mu' of equal size, 't' must cover the support defined 'Ly' - default: NULL

userSigma2

The user-defined measurement error variance. A positive scalar. If specified then the vanilla approach is used (methodRho is set to 'vanilla', unless specified otherwise). Default to 'NULL'

userRho

The user-defined measurement truncation threshold used for the calculation of functional principal components scores. A positive scalar. Default to 'NULL'

useBW1SE

Pick the largest bandwidth such that CV-error is within one Standard Error from the minimum CV-error, relevant only if methodBwMu ='CV' and/or methodBwCov ='CV'; logical - default: FALSE

imputeScores

Whether to impute the FPC scores or not; default: 'TRUE'

verbose

Display diagnostic messages; logical - default: FALSE

Value

A list containing the following fields:

sigma2

Variance for measurement error.

lambda

A vector of length K containing eigenvalues.

phi

An nWorkGrid by K matrix containing eigenfunctions, supported on workGrid.

xiEst

A n by K matrix containing the FPC estimates.

xiVar

A list of length n, each entry containing the variance estimates for the FPC estimates.

obsGrid

The (sorted) grid points where all observation points are pooled.

mu

A vector of length nWorkGrid containing the mean function estimate.

workGrid

A vector of length nWorkGrid. The internal regular grid on which the eigen analysis is carried on.

smoothedCov

A nWorkGrid by nWorkGrid matrix of the smoothed covariance surface.

fittedCov

A nWorkGrid by nWorkGrid matrix of the fitted covariance surface, which is guaranteed to be non-negative definite.

fittedCorr

A nWorkGrid by nWorkGrid matrix of the fitted correlation surface computed from fittedCov.

optns

A list of actually used options.

timings

A vector with execution times for the basic parts of the FPCA call.

bwMu

The selected (or user specified) bandwidth for smoothing the mean function.

bwCov

The selected (or user specified) bandwidth for smoothing the covariance function.

rho

A regularizing scalar for the measurement error variance estimate.

cumFVE

A vector with the fraction of the cumulative total variance explained with each additional FPC.

FVE

A fraction indicating the total variance explained by chosen FPCs with corresponding 'FVEthreshold'.

selectK

Number K of selected components.

criterionValue

A scalar specifying the criterion value obtained by the selected number of components with specific methodSelectK: FVE, AIC, BIC values or NULL for fixed K.

inputData

A list containing the original 'Ly' and 'Lt' lists used as inputs to FPCA. NULL if 'lean' was specified to be TRUE.

References

Yao, F., Müller, H.G., Clifford, A.J., Dueker, S.R., Follett, J., Lin, Y., Buchholz, B., Vogel, J.S. (2003). "Shrinkage estimation for functional principal component scores, with application to the population kinetics of plasma folate." Biometrics 59, 676-685. (Shrinkage estimates for dense data)

Yao, Fang, Müller, Hans-Georg and Wang, Jane-Ling (2005). "Functional data analysis for sparse longitudinal data." Journal of the American Statistical Association 100, no. 470 577-590. (Sparse data FPCA)

Liu, Bitao and Müller, Hans-Georg (2009). "Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics." Journal of the American Statistical Association 104, no. 486 704-717. (Sparse data FPCA)

Castro, P. E., Lawton, W.H. and Sylvestre, E.A. (1986). "Principal modes of variation for processes with continuous sample curves." Technometrics 28, no. 4, 329-337. (modes of variation for dense data FPCA)

Examples

set.seed(1)
n <- 20
pts <- seq(0, 1, by=0.05)
sampWiener <- Wiener(n, pts)
sampWiener <- Sparsify(sampWiener, pts, 10)
res <- FPCA(sampWiener$Ly, sampWiener$Lt, 
            list(dataType='Sparse', error=FALSE, kernel='epan', verbose=TRUE))
plot(res) # The design plot covers [0, 1] * [0, 1] well.
CreateCovPlot(res, 'Fitted')
CreateCovPlot(res, corr = TRUE)

fdapace documentation built on July 3, 2024, 5:08 p.m.