spca: Sparse Principal Components Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/spca.R

Description

Performs a sparse principal component analysis for variable selection using singular value decomposition and lasso penalisation on the loading vectors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
spca(
  X,
  ncomp = 2,
  center = TRUE,
  scale = TRUE,
  keepX = rep(ncol(X), ncomp),
  max.iter = 500,
  tol = 1e-06,
  logratio = c("none", "CLR"),
  multilevel = NULL
)

Arguments

X

a numeric matrix (or data frame) which provides the data for the sparse principal components analysis. It should not contain missing values.

ncomp

Integer, if data is complete ncomp decides the number of components and associated eigenvalues to display from the pcasvd algorithm and if the data has missing values, ncomp gives the number of components to keep to perform the reconstitution of the data using the NIPALS algorithm. If NULL, function sets ncomp = min(nrow(X), ncol(X))

center

(Default=TRUE) Logical, whether the variables should be shifted to be zero centered. Only set to FALSE if data have already been centered. Alternatively, a vector of length equal the number of columns of X can be supplied. The value is passed to scale. If the data contain missing values, columns should be centered for reliable results.

scale

(Default=TRUE) Logical indicating whether the variables should be scaled to have unit variance before the analysis takes place.

keepX

numeric vector of length ncomp, the number of variables to keep in loading vectors. By default all variables are kept in the model. See details.

max.iter

Integer, the maximum number of iterations in the NIPALS algorithm.

tol

Positive real, the tolerance used in the NIPALS algorithm.

logratio

one of ('none','CLR'). Specifies the log ratio transformation to deal with compositional values that may arise from specific normalisation in sequencing data. Default to 'none'

multilevel

sample information for multilevel decomposition for repeated measurements.

Details

scale= TRUE is highly recommended as it will help obtaining orthogonal sparse loading vectors.

keepX is the number of variables to select in each loading vector, i.e. the number of variables with non zero coefficient in each loading vector.

Note that data can contain missing values only when logratio = 'none' is used. In this case, center=TRUE should be used to center the data in order to effectively ignore the missing values. This is the default behaviour in spca.

According to Filzmoser et al., a ILR log ratio transformation is more appropriate for PCA with compositional data. Both CLR and ILR are valid.

Logratio transform and multilevel analysis are performed sequentially as internal pre-processing step, through logratio.transfo and withinVariation respectively.

Logratio can only be applied if the data do not contain any 0 value (for count data, we thus advise the normalise raw data with a 1 offset). For ILR transformation and additional offset might be needed.

The principal components are not guaranteed to be orthogonal in sPCA. We adopt the approach of Shen and Huang 2008 (Section 2.3) to estimate the explained variance in the case where the sparse loading vectors (and principal components) are not orthogonal. The data are projected onto the space spanned by the first loading vectors and the variance explained is then adjusted for potential correlation between PCs. Note that in practice, the loading vectors tend to be orthogonal if the data are centered and scaled in sPCA.

Value

spca returns a list with class "spca" containing the following components:

ncomp

the number of components to keep in the calculation.

prop_expl_var

the adjusted percentage of variance explained for each component.

cum.var

the adjusted cumulative percentage of variances explained.

keepX

the number of variables kept in each loading vector.

iter

the number of iterations needed to reach convergence for each component.

rotation

the matrix containing the sparse loading vectors.

x

the matrix containing the principal components.

Author(s)

Kim-Anh LĂȘ Cao, Fangzhou Yao, Leigh Coonan, Ignacio Gonzalez, Al J Abadi

References

Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.

See Also

pca and http://www.mixOmics.org for more details.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
data(liver.toxicity)
spca.rat <- spca(liver.toxicity$gene, ncomp = 3, keepX = rep(50, 3))
spca.rat

## variable representation
plotVar(spca.rat, cex = 1)
## Not run: 
plotVar(spca.rat,style="3d")

## End(Not run)

## samples representation
plotIndiv(spca.rat, ind.names = liver.toxicity$treatment[, 3],
          group = as.numeric(liver.toxicity$treatment[, 3]))

## Not run: 
plotIndiv(spca.rat, cex = 0.01,
col = as.numeric(liver.toxicity$treatment[, 3]),style="3d")

## End(Not run)

## example with multilevel decomposition and CLR log ratio transformation
data("diverse.16S")
spca.res = spca(X = diverse.16S$data.TSS, ncomp = 5,
logratio = 'CLR', multilevel = diverse.16S$sample)
plot(spca.res)
plotIndiv(spca.res, ind.names = FALSE, group = diverse.16S$bodysite, title = '16S diverse data',
legend=TRUE)

Example output

Loading required package: MASS
Loading required package: lattice
Loading required package: ggplot2

Loaded mixOmics 6.14.0
Thank you for using mixOmics!
Tutorials: http://mixomics.org
Bookdown vignette: https://mixomicsteam.github.io/Bookdown
Questions, issues: Follow the prompts at http://mixomics.org/contact-us
Cite us:  citation('mixOmics')

sparse PCA with 3 principal components. 
  Input data X of dimensions: 64 3116 
  Number of selected variables on each prinicipal components:
PC1   PC2   PC3   
 50    50    50   
  Proportion of adjusted explained variance for the first 3 principal components, see object$explained_variance: 
        PC1              PC2              PC3      
0.010659171      0.009289747      0.007724736      
  
  Cumulative proportion of adjusted explained variance for the first 3 principal components, see object$cum.var: 
       PC1             PC2             PC3      
0.01065917      0.01994892      0.02767366      
  
  Other available components: 
 -------------------- 
  loading vectors: see object$rotation 
  Other functions: 
 -------------------- 
  tune.spca, plotIndiv, plot, plotVar, selectVar, biplot
Loading required namespace: rgl
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl.init' failed, running with 'rgl.useNULL = TRUE'. 
Splitting the variation for 1 level factor.

mixOmics documentation built on April 15, 2021, 6:01 p.m.