# WH.1d.PCA: Principal components analysis of histogram variable based on... In HistDAWass: Histogram-Valued Data Analysis

## Description

The function implements a Principal components analysis of histogram variable based on Wasserstein distance. It performs a centered (not standardized) PCA on a set of quantiles of a variable. Being a distribution a multivalued description, the analysis performs a dimensional reduction and a visualization of distributions. It is a 1d (one dimension) becuse it is considered just one histogram variable.

## Usage

 ```1 2``` ```WH.1d.PCA(data, var, quantiles = 10, plots = TRUE, listaxes = c(1:4), axisequal = FALSE, qcut = 1, outl = 0) ```

## Arguments

 `data` A MatH object (a matrix of distributionH). `var` An integer, the variable number. `quantiles` An integer, it is the number of quantiles used in the analysis. `plots` a logical value. Default=TRUE plots are drawn. `listaxes` A vector of integers listing the axis for the 2d factorial reperesntations. `axisequal` A logical value. Default TRUE, the plot have the same scale for the x and the y axes. `qcut` a number between 0.5 and 1, it is used for the plot of densities, and avoids very peaked densities. Default=1, all the densities are considered. `outl` a number between 0 (default) and 0.5. For each distribution, is the amount of mass removed from the tails of the distribution. For example, if 0.1, from each distribution is cut away a left tail and a right one each containing the 0.1 of mass.

## Details

In the framework of symbolic data analysis (SDA), distribution-valued data are defined as multivalued data, where each unit is described by a distribution (e.g., a histogram, a density, or a quantile function) of a quantitative variable. SDA provides different methods for analyzing multivalued data. Among them, the most relevant techniques proposed for a dimensional reduction of multivalued quantitative variables is principal component analysis (PCA). This paper gives a contribution in this context of analysis. Starting from new association measures for distributional variables based on a peculiar metric for distributions, the squared Wasserstein distance, a PCA approach is proposed for distribution-valued data, represented by quantile-variables.

## Value

a list with the results of the PCA in the MFA format of package FactoMineR for function MFA

## References

Verde, R.; Irpino, A.; Balzanella, A., "Dimension Reduction Techniques for Distributional Symbolic Data," Cybernetics, IEEE Transactions on , vol.PP, no.99, pp.1,1 doi: 10.1109/TCYB.2015.2389653 keywords: Correlation;Covariance matrices;Distribution functions;Histograms;Measurement;Principal component analysis;Shape;Distributional data;Wasserstein distance;principal components analysis;quantiles, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7024099&isnumber=6352949

## Examples

 `1` ```results=WH.1d.PCA(data = BLOOD,var = 1, listaxes=c(1:2)) ```

### Example output

```We do a PCA on variable --->  Cholesterol
dev.new(): using pdf(file="Rplots1.pdf")
dev.new(): using pdf(file="Rplots2.pdf")
dev.new(): using pdf(file="Rplots3.pdf")
dev.new(): using pdf(file="Rplots4.pdf")
```

HistDAWass documentation built on March 20, 2018, 5:04 p.m.