knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(fpcaCor)
This is a brief introduction to the package fpcaCor. For a general overview on functional data analysis (FDA), see (Ramsey and Silverman, 2005). Specifically, we work on functional principal component analysis (FPCA) for dimension reduction of functional data. For more details, readers can look at (@Yao:2003hd; @Goldsmith:2012ie; and @Xiao:2015jw).
Let $X_i(t)$ be the measurement for subject $i,$ where $i = 1, 2, \dots, n$ at time $t, \ t = 1,2 \dots, T$. By using the basic idea of FPCA based on the truncation of Karhunen-Loe've (KL) decomposition as
\begin{equation} X_i(t) = \mu(t) + \sum_{j= 1}^r \xi_{ij}\phi_j(t) + \varepsilon_i(t), \end{equation}
where $\mu(t)$ is the overall mean, $\xi_{ij}$ are subject-specific scores, $\phi_j(t)$ are eigenfunctions of underlying covariance operator $K(\cdot, \cdot)$, $r$ is the truncation level and $\varepsilon_i(t)$ are iid error terms.
Instead of working with observed $X_i(t)$, we want to focus on a corresponding latent Gaussian $Z_i(t)$. We will apply KL decomposition to latent Gaussian $Z_i(t)$ rather than the *observed * $X_i(t)$. Then we get
\begin{equation} Z_i(t) = \mu(t) + \sum_{j= 1}^r \xi_{ij}\phi_j(t) + \varepsilon_i(t). \end{equation}
To get $X_i(t)$ back from $Z_i(t)$, we use a strictly monotone transformation $f_t$ so that $f_t(X_i(t)) = Z_i(t)$ for continuous $X_i(t)$.
For the noncontinuous case, we will use corresponding latent models from @Yoon:2020hc. See the paper (@Yoon:2020hc) for more details.
First, we will generate variables with the same types, which all are continuous, using sample size $n = 20$ which will be our toy examples. To generate our toy example we will use the function gen_data
from the R package latentcor
. Here the number of types is 50.
library(latentcor) set.seed(326435) Mydata.list = gen_data(n = 20, types = rep("con", 50))
The function gen_data
gives two outputs (See ?gen_data
). We only need the generated data matrix X.
X = Mydata.list$X head(X, n = 6)
We obtained a matrix X with 20 many rows (sample size) and 50 many columns (number of variables). All the variables have the sample type (continuous).
Now we can use the function fpcaCor
to get eigen-functions for obtained data (matrix X). The function fpcaCor
follows several steps to extract eigen-functions. First, we get the estimated latent correlation matrix $\hat K$ which can be also obtained by using the function latentcor
. Then we use the "B-spline basis functions" to smooth the correlation matrix $\hat K$. After we obtain the smoothed correlation matrix $\tilde K$, we can extract the eigen-function for given the proportion of variance explained (pve) or for given number of principal components (npc) if it is supplied.
output1 = fpcaCor(X, pve = 0.99)
Here we only want 99% of the variance to be explained. Since we did not supply npc
, it is obtained by using pve
in the function fpcaCor
.
NPC = output1$npc print(NPC)
Now we can see our eigen-functions.
head(output1$eigenfuncs)
If we change the value of pve
to get 99.99% of the variance, we obtain a new NPC value:
output2 = fpcaCor(X, pve = 0.9999) NPCnew = output2$npc print(NPCnew)
This also changes our eigen-functions:
head(output2$eigenfuncs)
On the other hand, we can supply pnc
but notice that pnc
cannot larger than the number of columns of X. For more details see fpcaCor
.
p = ncol(X) print(p) output3 = fpcaCor(X, npc = 10)
Since we picked npc = 10, we will have 10 many eigen-functions.
eigenfuncs= output3$eigenfuncs head(eigenfuncs)
We use the built-in dataset iris
:
head(iris)
Since the last column is a string, we remove it from consideration. Since the original data has a small number of columns, we create additional features by taking linear combinations of the given columns.
iris_mat <- as.matrix(iris[, -5])
Since the original data has a small number of columns, we create additional features by taking linear combinations of the given columns.
iris_mat = cbind(iris_mat, iris_mat[, 1] + 3 * iris_mat[, 2]) iris_mat = cbind(iris_mat, iris_mat[, 5] - iris_mat[, 3]) iris_mat = cbind(iris_mat, iris_mat[, 2] + iris_mat[, 3] + 20 * iris_mat[, 4])
Now we can use fpcaCor
to extract eigen-functions:
iris_result = fpcaCor(iris_mat, nbasis = 3, pve = 0.99) print(iris_result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.