fpcad: Functional PCA of probability densities

View source: R/fpcad.R

fpcadR Documentation

Functional PCA of probability densities


Performs functional principal component analysis of probability densities in order to describe a data folder, consisting of T groups of individuals on which are observed p variables. It returns an object of class fpcad.


fpcad(xf, group.name = "group", gaussiand = TRUE, windowh = NULL, normed = TRUE,
    centered = TRUE, data.centered = FALSE, data.scaled = FALSE,
    common.variance = FALSE, nb.factors = 3, nb.values = 10, sub.title = "",
    plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3,
    filename = NULL)



object of class "folder" or data.frame.

  • If it is an object of class "folder", its elements are data frames with p numeric columns. If there are non numeric columns, there is an error. The t^{th} element (t = 1, \ldots, T) matches with the t^{th} group.

  • If it is a data frame, the column with name given by the group.name argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error.



  • If xf is an object of class "folder", name of the grouping variable in the returned results. The default is groupname = "group".

  • If xf is a data frame, group.name is the name of the column of xf containing the groups.


logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.


either a list of T bandwidths (one per density associated to a group), or a strictly positive number. If windowh = NULL (default), the bandwidths are automatically computed. See Details.


logical. If TRUE (default), the densities are normed before computing the distances.


logical. If TRUE (default), the densities are centered.


logical. If TRUE (default is FALSE), the data of each group are centered.


logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered = FALSE) and scaled.


logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled = TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.


numeric. Number of returned principal scores (default nb.factors = 3).

Warning: The plot.fpcad and interpret.fpcad functions cannot take into account more than nb.factors principal factors.


numerical. Number of returned eigenvalues (default nb.values = 10).


string. If provided, the subtitle for the graphs.


logical. If TRUE (default), the barplot of the eigenvalues is plotted.


logical. If TRUE, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by nscore argument.


numeric vector. If plot.score = TRUE, the numbers of the principal scores which are plotted. By default it is equal to nscore = 1:3. Its components cannot be greater than nb.factors.


string. Name of the file in which the results are saved. By default (filename = NULL) the results are not saved.


The T probability densities f_t corresponding to the T groups of individuals are either parametrically estimated (gaussiand = TRUE) or estimated using the Gaussian kernel method (gaussiand = FALSE). In the latter case, the windowh argument provides the list of the bandwidths to use. Notice that in the multivariate case (p>1) the bandwidths are positive-definite matrices.

If windowh is a numerical value, the matrix bandwidth is of the form h S, where S is either the square root of the covariance matrix (p>1) or the standard deviation of the estimated density.

If windowh = NULL (default), h in the above formula is computed using the bandwidth.parameter function.


Returns an object of class fpcad, that is a list including:


data frame of the eigenvalues and percentages of inertia.


data frame of the contributions to the first nb.factors principal components.


data frame of the qualities on the first nb.factors principal factors.


data frame of the first nb.factors principal scores.


vector of the L^2 norms of the densities.


list of the means.


list of the covariance matrices.


list of the correlation matrices.


list of the skewness coefficients.


list of the kurtosis coefficients.


Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard


Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

See Also

print.fpcad, plot.fpcad, interpret.fpcad, bandwidth.parameter


# Case of a normed non-centred PCA of Gaussian densities (on 3 architectural 
# characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym))
rosesf <- as.folder(roses[,c("Sha","Den","Sym","rose")])
result3 <- fpcad(rosesf, group.name = "rose")

# Applied to a data frame:
result3df <- fpcad(roses[,c("Sha","Den","Sym","rose")], group.name = "rose")

# Flower colors of the roses
scores <- result3$scores
scores <- data.frame(scores, color = scores$rose, stringsAsFactors = TRUE)
colours <- scores$rose
colours <- factor(c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red",
                  F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow"))
levels(scores$color) <- c(A = "yellow", B = "yellow", C = "pink", D = "yellow", E = "red",
                         F = "yellow", G = "pink", H = "pink", I = "yellow", J = "yellow")
# Scores according to the first two principal components, per color
plot(result3, nscore = 1:2, color = colours)

dad documentation built on Aug. 30, 2023, 5:06 p.m.