pcaSphere: Spherical Principal Components Analysis
In abnormally-distributed/cvreg: Cross Validation and Robust Estimation Utilities

Description Usage Arguments Value References Examples

This function implements the spherical principal components analysis method of Locantore et al. (1999) to conduct PCA while downweighting large outliers. It is one of the earliest attempts at producing a robust PCA method, but despite the development of more sophisticated methods, this works quite well (Maronna, 2005). The method makes use of the multivariate spatial median, also known as the multivariate L1-median, or just multivariate median. The spatial median minimizes the sum of euclidean distances, and hence gives the coordinates for the center of the hypershpere (hence the origin of the name spherical principal components analysis):

M_hat = arg min Σ ||x_i - M||

It differs from the vector of univariate medians, called the marginal multivariate median or componentwise median. The componentwise median is not invariant to transformations of the data, while the spatial median of a data set corresponds to the back-transformed spatial median of a transformed data set. The spatial median is unique when the data are not perfectly collinear, and when the data are collinear, multiple solutions (however all valid) may exist (Milasevic & Ducharme, 1987). The spatial median has a bounded influence function and a breakdown point of 50%.

Now that the nature of the spatial median has been made clear, the motivation behind spherical PCA can be made clear. The distance of a data point from the spatial median is known as its spatial sign rank. The spatial signs are thus a transformation/standardization of the data in unit-hypersphere coordinates. Due to the robustness of the spatial median, multivariate outliers are easily identifiable, and do not negatively influence the estimate of the principal components for the "clean" data. While the eigenvectors enjoy this benefit and are consistent estimates of the population eigenvalues of a data set, the eigenvalues are not consistent estimators. However, this is easily remedied by using a robust scale estimator to calculate the standard deviations for the PCA scores, then squaring these to obtain the variances (which are the eigenvalues). The method is very fast as well, even for larger data sets.

pcaSphere(
  x,
  ncomp = min(nrow(x) - 1, ncol(x)),
  scale = TRUE,
  evest = c("tau", "bisq", "pb")
)

`x`	a matrix or data frame containing only numeric variables
`ncomp`	the number of components to retain.
`scale`	should the variables be scaled prior to analysis? Defaults to TRUE.
`evest`	the scale estimator used to provide consistent estimates of the population eigenvalues. one of "tau" (the default), "bisq" (bisquare), or "pb" (percentage bend).

a pcaSphere object containing eigenvalues, eigenvectors, component loadings, component scores, and spatial median.

Locantore, N., Marron, J., Simpson, D, Tripoli, N., Zhang, J., Cohen, K. (1999) Robust principal component analysis for functional data. Sociedad de Estadistica e Investigacion Operativa Test. 8: 1. https://doi.org/10.1007/BF02595862

Maronna, R.A. (2005) Principal components and orthogonal regression based on robust scales, Technometrics, 47, 264–273

Milasevic, P. & Ducharme, G. R. (1987) Uniqueness of the Spatial Median. Ann. Statist. 15(3) 1332-1333. doi:10.1214/aos/1176350511.