# fpcad: Functional PCA of probability densities In dad: Three-Way / Multigroup Data Analysis Through Densities

## Description

Performs functional principal component analysis of probability densities in order to describe a data folder, consisting of T groups of individuals on which are observed p variables. It returns an object of class `fpcad`.

## Usage

 ```1 2 3 4 5``` ```fpcad(xf, group.name = "group", gaussiand = TRUE, windowh = NULL, normed = TRUE, centered = TRUE, data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3, filename = NULL) ```

## Arguments

 `xf` object of class `"folder"` or data.frame. If it is an object of class `"folder"`, its elements are data frames with p numeric columns. If there are non numeric columns, there is an error. The t^{th} element (t = 1, …, T) matches with the t^{th} group. If it is a data frame, the column with name given by the `group.name` argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error. `group.name` string. If `xf` is an object of class `"folder"`, name of the grouping variable in the returned results. The default is `groupname = "group"`. If `xf` is a data frame, `group.name` is the name of the column of `xf` containing the groups. `gaussiand` logical. If `TRUE` (default), the probability densities are supposed Gaussian. If `FALSE`, densities are estimated using the Gaussian kernel method. `windowh` either a list of T bandwidths (one per density associated to a group), or a strictly positive number. If `windowh = NULL` (default), the bandwidths are automatically computed. See Details. `normed` logical. If `TRUE` (default), the densities are normed before computing the distances. `centered` logical. If `TRUE` (default), the densities are centered. `data.centered` logical. If `TRUE` (default is `FALSE`), the data of each group are centered. `data.scaled` logical. If `TRUE` (default is `FALSE`), the data of each group are centered (even if `data.centered = FALSE`) and scaled. `common.variance` logical. If `TRUE` (default is `FALSE`), a common covariance matrix (or correlation matrix if `data.scaled = TRUE`), computed on the whole data, is used. If `FALSE` (default), a covariance (or correlation) matrix per group is used. `nb.factors` numeric. Number of returned principal scores (default `nb.factors = 3`). Warning: The `plot.fpcad` and `interpret.fpcad` functions cannot take into account more than `nb.factors` principal factors. `nb.values` numerical. Number of returned eigenvalues (default `nb.values = 10`). `sub.title` string. If provided, the subtitle for the graphs. `plot.eigen` logical. If `TRUE` (default), the barplot of the eigenvalues is plotted. `plot.score` logical. If `TRUE`, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by `nscore` argument. `nscore` numeric vector. If `plot.score = TRUE`, the numbers of the principal scores which are plotted. By default it is equal to `nscore = 1:3`. Its components cannot be greater than `nb.factors`. `filename` string. Name of the file in which the results are saved. By default (`filename = NULL`) the results are not saved.

## Details

The T probability densities f_t corresponding to the T groups of individuals are either parametrically estimated (`gaussiand = TRUE`) or estimated using the Gaussian kernel method (`gaussiand = FALSE`). In the latter case, the `windowh` argument provides the list of the bandwidths to use. Notice that in the multivariate case (p>1) the bandwidths are positive-definite matrices.

If `windowh` is a numerical value, the matrix bandwidth is of the form h S, where S is either the square root of the covariance matrix (p>1) or the standard deviation of the estimated density.

If `windowh = NULL` (default), h in the above formula is computed using the `bandwidth.parameter` function.

## Value

Returns an object of class `fpcad`, that is a list including:

 `inertia ` data frame of the eigenvalues and percentages of inertia. `contributions ` data frame of the contributions to the first `nb.factors` principal components. `qualities ` data frame of the qualities on the first `nb.factors` principal factors. `scores ` data frame of the first `nb.factors` principal scores. `norm ` vector of the L^2 norms of the densities. `means ` list of the means. `variances ` list of the covariance matrices. `correlations ` list of the correlation matrices. `skewness ` list of the skewness coefficients. `kurtosis ` list of the kurtosis coefficients.

## Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

## References

Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23``` ```data(roses) # Case of a normed non-centred PCA of Gaussian densities (on 3 architectural # characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym)) rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")]) result3 <- fpcad(rosesf, group.name = "rose") print(result3) plot(result3) # Applied to a data frame: result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name = "rose") print(result3df) plot(result3df) # Flower colors of the roses scores <- result3\$scores scores <- data.frame(scores, color = scores\$rose, stringsAsFactors = TRUE) colours <- scores\$rose colours <- factor(c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow")) levels(scores\$color) <- c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red", F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow") # Scores according to the first two principal components, per color plot(result3, nscore = 1:2, color = colours) ```