Home

/

GitHub

/

In gozdesert/fpcaCor: FPCA by Somothing the Correlation Marix

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(fpcaCor)

Overview

This is a brief introduction to the package fpcaCor. For a general overview on functional data analysis (FDA), see (Ramsey and Silverman, 2005). Specifically, we work on functional principal component analysis (FPCA) for dimension reduction of functional data. For more details, readers can look at (@Yao:2003hd; @Goldsmith:2012ie; and @Xiao:2015jw).

Let $X_i(t)$ be the measurement for subject $i,$ where $i = 1, 2, \dots, n$ at time $t, \ t = 1,2 \dots, T$. By using the basic idea of FPCA based on the truncation of Karhunen-Loe've (KL) decomposition as

\begin{equation} X_i(t) = \mu(t) + \sum_{j= 1}^r \xi_{ij}\phi_j(t) + \varepsilon_i(t), \end{equation}

where $\mu(t)$ is the overall mean, $\xi_{ij}$ are subject-specific scores, $\phi_j(t)$ are eigenfunctions of underlying covariance operator $K(\cdot, \cdot)$, $r$ is the truncation level and $\varepsilon_i(t)$ are iid error terms.

Copula FPCA

Instead of working with observed $X_i(t)$, we want to focus on a corresponding latent Gaussian $Z_i(t)$. We will apply KL decomposition to latent Gaussian $Z_i(t)$ rather than the *observed * $X_i(t)$. Then we get

\begin{equation} Z_i(t) = \mu(t) + \sum_{j= 1}^r \xi_{ij}\phi_j(t) + \varepsilon_i(t). \end{equation}

To get $X_i(t)$ back from $Z_i(t)$, we use a strictly monotone transformation $f_t$ so that $f_t(X_i(t)) = Z_i(t)$ for continuous $X_i(t)$.

For the noncontinuous case, we will use corresponding latent models from @Yoon:2020hc. See the paper (@Yoon:2020hc) for more details.

R implementation

Getting Started

First, we will generate variables with the same types, which all are continuous, using sample size $n = 20$ which will be our toy examples. To generate our toy example we will use the function gen_data from the R package latentcor. Here the number of types is 50.

library(latentcor)
set.seed(326435)
Mydata.list = gen_data(n = 20, types = rep("con", 50))

The function gen_data gives two outputs (See ?gen_data). We only need the generated data matrix X.

X = Mydata.list$X 
head(X, n = 6)

We obtained a matrix X with 20 many rows (sample size) and 50 many columns (number of variables). All the variables have the sample type (continuous).

FPCA in R using fpcaCor

Now we can use the function fpcaCor to get eigen-functions for obtained data (matrix X). The function fpcaCor follows several steps to extract eigen-functions. First, we get the estimated latent correlation matrix $\hat K$ which can be also obtained by using the function latentcor. Then we use the "B-spline basis functions" to smooth the correlation matrix $\hat K$. After we obtain the smoothed correlation matrix $\tilde K$, we can extract the eigen-function for given the proportion of variance explained (pve) or for given number of principal components (npc) if it is supplied.

output1 = fpcaCor(X, pve = 0.99)

Here we only want 99% of the variance to be explained. Since we did not supply npc, it is obtained by using pve in the function fpcaCor.

NPC = output1$npc
print(NPC)

Now we can see our eigen-functions.

head(output1$eigenfuncs)

If we change the value of pve to get 99.99% of the variance, we obtain a new NPC value:

output2 = fpcaCor(X, pve = 0.9999)  
NPCnew = output2$npc
print(NPCnew)

This also changes our eigen-functions:

head(output2$eigenfuncs)

On the other hand, we can supply pnc but notice that pnc cannot larger than the number of columns of X. For more details see fpcaCor.

p = ncol(X)
print(p)
output3 = fpcaCor(X, npc = 10)

Since we picked npc = 10, we will have 10 many eigen-functions.

eigenfuncs= output3$eigenfuncs
head(eigenfuncs)

Example with iris dataset

We use the built-in dataset iris:

head(iris)

Since the last column is a string, we remove it from consideration. Since the original data has a small number of columns, we create additional features by taking linear combinations of the given columns.

iris_mat <- as.matrix(iris[, -5])

Since the original data has a small number of columns, we create additional features by taking linear combinations of the given columns.

iris_mat = cbind(iris_mat, iris_mat[, 1] + 3 * iris_mat[, 2]) 
iris_mat = cbind(iris_mat, iris_mat[, 5] - iris_mat[, 3])
iris_mat = cbind(iris_mat, iris_mat[, 2] + iris_mat[, 3] + 20 * iris_mat[, 4])

Now we can use fpcaCor to extract eigen-functions:

iris_result = fpcaCor(iris_mat, nbasis = 3, pve = 0.99)  
print(iris_result)

References

gozdesert/fpcaCor documentation built on Feb. 13, 2022, 6:31 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gozdesert/fpcaCor
FPCA by Somothing the Correlation Marix

In gozdesert/fpcaCor: FPCA by Somothing the Correlation Marix

Overview

Copula FPCA

R implementation

Getting Started

FPCA in R using fpcaCor

Example with iris dataset

References

R Package Documentation

Browse R Packages

We want your feedback!

gozdesert/fpcaCor FPCA by Somothing the Correlation Marix

In gozdesert/fpcaCor: FPCA by Somothing the Correlation Marix

Overview

Copula FPCA

R implementation

Getting Started

FPCA in R using fpcaCor

Example with iris dataset

References

R Package Documentation

Browse R Packages

We want your feedback!

gozdesert/fpcaCor
FPCA by Somothing the Correlation Marix