nsprcomp: Non-Negative and Sparse PCA
In nsprcomp: Non-Negative and Sparse PCA

Description Usage Arguments Details Value Note References See Also Examples

Performs a constrained principal component analysis, where non-negativity and/or sparsity constraints are enforced on the principal axes. The result is returned as an object of class nsprcomp, which inherits from prcomp.

nsprcomp(x, ...)

## Default S3 method:
nsprcomp(x, retx = TRUE, ncomp = min(dim(x)),
  omega = rep(1, nrow(x)), k = ncol(x), nneg = FALSE, center = TRUE,
  scale. = FALSE, tol = NULL, nrestart = 5, em_tol = 0.001,
  em_maxiter = 100, partial_model = NULL, verbosity = 0, ...)

## S3 method for class 'formula'
nsprcomp(formula, data = NULL, subset, na.action, ...)

`x`	a numeric matrix or data frame which provides the data for the principal component analysis.
`...`	arguments passed to or from other methods.
`retx`	a logical value indicating whether the principal components, i.e. `x` projected into the principal subspace, should be returned.
`ncomp`	the number of principal components (PCs) to be computed. With the default setting, PCs are computed until `x` is fully deflated. `ncomp` can be specified implicitly if `k` is given as a vector.
`omega`	a vector with as many entries as there are data samples, to perform weighted PCA (analogous to weighted least-squares regression). The default is an equal weighting of all samples.
`k`	either a scalar or a vector of length `ncomp`, specifying the upper bounds on the cardinalities of the principal axes (PAs).
`nneg`	a logical value indicating whether the loadings should be non-negative, i.e. the PAs should be constrained to the non-negative orthant.
`center`	a logical value indicating whether the empirical mean of (the columns) of `x` should be subtracted. Alternatively, a vector of length equal to the number of columns of `x` can be supplied. The value is passed to `scale`.
`scale.`	a logical value indicating whether the columns of `x` should be scaled to have unit variance before the analysis takes place. The default is `FALSE` for consistency with `prcomp`. Alternatively, a vector of length equal to the number of columns of `x` can be supplied. The value is passed to `scale`.
`tol`	a threshold indicating the magnitude below which components should be omitted. Components are omitted if their standard deviations are less than or equal to `tol` times the standard deviation of the first component. With the default `NULL` setting, no components are omitted. With `tol = 0` or `tol = sqrt(.Machine$double.eps)`, essentially constant components are omitted.
`nrestart`	the number of random restarts for computing the principal component via expectation-maximization (EM) iterations. The solution achieving maximum standard deviation over all random restarts is kept. A value greater than one can help to avoid poor local maxima.
`em_tol`	If the relative change of the objective is less than `em_tol` between iterations, the EM procedure is asssumed to have converged to a local optimum.
`em_maxiter`	the maximum number of EM iterations to be performed. The EM procedure is terminated if either the `em_tol` or the `em_maxiter` criterion is satisfied.
`partial_model`	`NULL` or an object of class `nsprcomp`. The computation can be continued from a partial model by providing a `nsprcomp` object (either from a previous run of this function or from `asdev`) and setting `ncomp` to a value greater than the number of components contained in the partial model. See the examples for an illustration.
`verbosity`	an integer specifying the verbosity level. Greater values result in more output, the default is to be quiet.
`formula`	a formula with no response variable, referring only to numeric variables.
`data`	an optional data frame (or similar: see `model.frame`) containing the variables in the `formula`. By default the variables are taken from `environment(formula)`.
`subset`	an optional vector used to select rows (observations) of the data matrix `x`.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The ‘factory-fresh’ default is `na.omit`.

nsprcomp computes a principal component (PC) using expectation-maximization iterations, where non-negativity of the loadings is achieved by projecting the principal axis (PA) into the non-negative orthant, and sparsity of the loadings is achieved by soft thresholding (Sigg and Buhmann, 2008).

Because constrained principal axes no longer correspond to true eigenvectors of the covariance matrix and are usually not pairwise orthogonal, special attention needs to be paid when computing more than a single PC. The algorithm implements the generalized deflation method proposed by Mackey (2009) to maximize the additional variance of each component. Given a basis of the space spanned by the previous PAs, the variance of the PC is maximized after projecting the current PA to the ortho-complement space of the basis. This procedure maximizes the additional variance not explained by previous components, and is identical to standard PCA if no sparsity or non-negativity constraints are enforced on the PAs.

See the references for further details.

nsprcomp returns a list with class (nsprcomp, prcomp) containing the following elements:

`sdev`	the additional standard deviation explained by each component, see `asdev`.
`rotation`	the matrix of non-negative and/or sparse loadings, containing the principal axes as columns.
`x`	the scores matrix XW* containing the principal components as columns (after centering and scaling if requested). For the formula method, `napredict` is applied to handle the treatment of values omitted by the `na.action`.
`center, scale`	the centering and scaling used, or `FALSE`
`xp`	the deflated data matrix corresponding to `x`
`q`	an orthonormal basis for the principal subspace

The PCA terminology is not consistent across the literature. Given a zero mean data matrix X (with observations as rows) and a basis W of the principal subspace, we define the scores matrix as Z=X*W which contains the principal components as its columns. The columns of the pseudo-rotation matrix W are called the principal axes, and the elements of W are called the loadings.

Deflating the data matrix accumulates numerical errors over successive PCs.

Sigg, C. D. and Buhmann, J. M. (2008) Expectation-Maximization for Sparse and Non-Negative PCA. In Proceedings of the 25th International Conference on Machine Learning (pp. 960–967).

Mackey, L. (2009) Deflation Methods for Sparse PCA. In Advances in Neural Information Processing Systems (pp. 1017–1024).

asdev, peav, prcomp, scale

if (requireNamespace("MASS", quietly = TRUE)) withAutoprint({
  set.seed(1)

  # Regular PCA, with the tolerance set to return five PCs
  prcomp(MASS::Boston, tol = 0.36, scale. = TRUE)

  # Sparse PCA with different cardinalities per component. The number of components
  # is derived from the length of vector k.
  nsprcomp(MASS::Boston, k = c(13, 7, 5, 5, 5), scale. = TRUE)

  # Non-negative sparse PCA with four components. Note that the principal axes
  # naturally have a high degree of orthogonality, because each component
  # maximizes the additional variance not already explained.
  set.seed(1)
  nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE)

  # The optimization can get stuck in local optima. Increase the number of
  # random restarts or the number of power iterations to likely obtain decreasing
  # standard deviations.
  set.seed(1)
  (nspc <- nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE,
                    nrestart = 10, em_tol = 1e-4, verbosity = 1))

  # continue the computation of components from a partial model
  nsprcomp(MASS::Boston, k = 3, ncomp = 5, nneg = TRUE, scale. = TRUE, partial_model = nspc)

  # The reconstruction error for each sample can be influenced using the
  # weighting vector omega. To reconstruct the data, the generalized
  # inverse of the pseudo-rotation matrix has to be used, because the constrained
  # principal axes are in general not pairwise orthogonal.
  set.seed(1)
  X <- matrix(runif(5*10), 5)
  nspc <- nsprcomp(X, omega = c(5, 1, 1, 1, 5), ncomp = 2, nneg = TRUE)
  X_hat <- predict(nspc)%*%MASS::ginv(nspc$rotation) + matrix(1,5,1)%*%nspc$center
  rowSums((X - X_hat)^2)
})

> set.seed(1)
> prcomp(MASS::Boston, tol = 0.36, scale. = TRUE)
Standard deviations (1, .., p=14):
 [1] 2.5585132 1.2843410 1.1614241 0.9415625 0.9224421 0.8124105 0.7317177
 [8] 0.6348831 0.5265582 0.5022524 0.4612919 0.4277704 0.3660733 0.2456149

Rotation (n x k) = (14 x 5):
                 PC1          PC2          PC3          PC4          PC5
crim     0.242284451 -0.065873108  0.395077419 -0.100366211  0.004957659
zn      -0.245435005 -0.148002653  0.394545713 -0.342958421  0.114495002
indus    0.331859746  0.127075668 -0.066081913  0.009626936 -0.022583692
chas    -0.005027133  0.410668763 -0.125305293 -0.700406497 -0.535197817
nox      0.325193880  0.254276363 -0.046475549 -0.053707583  0.194605570
rm      -0.202816554  0.434005810  0.353406095  0.293357309 -0.008320481
age      0.296976574  0.260303205 -0.200823078  0.078426326  0.149750092
dis     -0.298169809 -0.359149977  0.157068710 -0.184747787 -0.106219480
rad      0.303412754  0.031149596  0.418510334  0.051374381 -0.230352185
tax      0.324033052  0.008851406  0.343232194  0.026810695 -0.163425820
ptratio  0.207679535 -0.314623061  0.000399092  0.342036328 -0.615707380
black   -0.196638358  0.026481032 -0.361375914  0.201741185 -0.367460674
lstat    0.311397955 -0.201245177 -0.161060336 -0.242621217  0.178358870
medv    -0.266636396  0.444924411  0.163188735  0.180297553 -0.050659893
> nsprcomp(MASS::Boston, k = c(13, 7, 5, 5, 5), scale. = TRUE)
Standard deviations (1, .., p=5):
[1] 2.5584808 1.2472326 1.0405531 0.9642164 0.8752789

Rotation (n x k) = (14 x 5):
               PC1        PC2       PC3        PC4        PC5
crim    -0.2429863  0.0000000 0.4854165  0.0000000  0.0000000
zn       0.2446861  0.0000000 0.5967689  0.0000000  0.0000000
indus   -0.3318673  0.0000000 0.0000000  0.0000000  0.0000000
chas     0.0000000 -0.4315023 0.0000000  0.7490440 -0.4310263
nox     -0.3253668 -0.2702262 0.0000000  0.0000000  0.0000000
rm       0.2017390 -0.4401132 0.0000000 -0.4474849  0.0000000
age     -0.2967945 -0.2763328 0.0000000  0.0000000  0.2574766
dis      0.2981333  0.3520975 0.2488706  0.0000000 -0.1555882
rad     -0.3042870  0.0000000 0.4437986  0.0000000  0.0000000
tax     -0.3247282  0.0000000 0.3864504  0.0000000  0.0000000
ptratio -0.2073060  0.3886047 0.0000000  0.0000000  0.0000000
black    0.1975268  0.0000000 0.0000000  0.4110236  0.8419257
lstat   -0.3109319  0.0000000 0.0000000  0.1935561  0.0000000
medv     0.2659092 -0.4424227 0.0000000 -0.1796834  0.1219658
> set.seed(1)
> nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE)
Standard deviations (1, .., p=4):
[1] 2.1308851 1.5497667 0.7822017 1.0082622

Rotation (n x k) = (14 x 4):
              PC1       PC2       PC3       PC4
crim    0.3126418 0.0000000 0.0000000 0.0000000
zn      0.0000000 0.4900940 0.0000000 0.0000000
indus   0.3990108 0.0000000 0.0000000 0.0000000
chas    0.0000000 0.0000000 0.0000000 0.9669007
nox     0.4027495 0.0000000 0.0000000 0.0000000
rm      0.0000000 0.4480493 0.7706175 0.0000000
age     0.3579659 0.0000000 0.0000000 0.0000000
dis     0.0000000 0.4586857 0.0000000 0.0000000
rad     0.3932454 0.0000000 0.0000000 0.0000000
tax     0.4151053 0.0000000 0.0000000 0.0000000
ptratio 0.0000000 0.0000000 0.0000000 0.0000000
black   0.0000000 0.3145526 0.0000000 0.2551531
lstat   0.3546046 0.0000000 0.0000000 0.0000000
medv    0.0000000 0.4997236 0.6372979 0.0000000
> set.seed(1)
> (nspc <- nsprcomp(MASS::Boston, k = c(7, 5, 2, 2), nneg = TRUE, scale. = TRUE, 
+     nrestart = 10, em_tol = 1e-04, verbosity = 1))
[1] "component 1: maximum objective is 2293 at random restart 0"
[1] "component 1: maximum objective is 2293 at random restart 1"
[1] "component 1: maximum objective is 2293 at random restart 2"
[1] "component 1: maximum objective is 2293 at random restart 3"
[1] "component 1: maximum objective is 2293 at random restart 4"
[1] "component 1: maximum objective is 2293 at random restart 5"
[1] "component 1: maximum objective is 2293 at random restart 6"
[1] "component 1: maximum objective is 2293 at random restart 7"
[1] "component 1: maximum objective is 2293 at random restart 8"
[1] "component 1: maximum objective is 2293 at random restart 9"
[1] "component 2: maximum objective is 1213 at random restart 0"
[1] "component 2: maximum objective is 1213 at random restart 1"
[1] "component 2: maximum objective is 1213 at random restart 2"
[1] "component 2: maximum objective is 1213 at random restart 3"
[1] "component 2: maximum objective is 1213 at random restart 4"
[1] "component 2: maximum objective is 1213 at random restart 5"
[1] "component 2: maximum objective is 1213 at random restart 6"
[1] "component 2: maximum objective is 1213 at random restart 7"
[1] "component 2: maximum objective is 1213 at random restart 8"
[1] "component 2: maximum objective is 1213 at random restart 9"
[1] "component 3: maximum objective is 404 at random restart 0"
[1] "component 3: maximum objective is 523.2 at random restart 1"
[1] "component 3: maximum objective is 525.1 at random restart 2"
[1] "component 3: maximum objective is 525.1 at random restart 3"
[1] "component 3: maximum objective is 525.1 at random restart 4"
[1] "component 3: maximum objective is 525.1 at random restart 5"
[1] "component 3: maximum objective is 525.1 at random restart 6"
[1] "component 3: maximum objective is 525.1 at random restart 7"
[1] "component 3: maximum objective is 525.1 at random restart 8"
[1] "component 3: maximum objective is 525.1 at random restart 9"
[1] "component 4: maximum objective is 520.6 at random restart 0"
[1] "component 4: maximum objective is 520.6 at random restart 1"
[1] "component 4: maximum objective is 520.6 at random restart 2"
[1] "component 4: maximum objective is 520.6 at random restart 3"
[1] "component 4: maximum objective is 520.7 at random restart 4"
[1] "component 4: maximum objective is 520.7 at random restart 5"
[1] "component 4: maximum objective is 520.7 at random restart 6"
[1] "component 4: maximum objective is 520.7 at random restart 7"
[1] "component 4: maximum objective is 520.7 at random restart 8"
[1] "component 4: maximum objective is 520.7 at random restart 9"
Standard deviations (1, .., p=4):
[1] 2.130886 1.549822 1.014159 1.011090

Rotation (n x k) = (14 x 4):
              PC1       PC2       PC3       PC4
crim    0.3127134 0.0000000 0.0000000 0.0000000
zn      0.0000000 0.4850296 0.0000000 0.0000000
indus   0.3989959 0.0000000 0.0000000 0.0000000
chas    0.0000000 0.0000000 0.9783593 0.0000000
nox     0.4025259 0.0000000 0.0000000 0.0000000
rm      0.0000000 0.4519551 0.0000000 0.0000000
age     0.3572843 0.0000000 0.0000000 0.0000000
dis     0.0000000 0.4529891 0.0000000 0.2019432
rad     0.3936984 0.0000000 0.0000000 0.0000000
tax     0.4155276 0.0000000 0.0000000 0.0000000
ptratio 0.0000000 0.0000000 0.0000000 0.9793972
black   0.0000000 0.3177198 0.0000000 0.0000000
lstat   0.3545021 0.0000000 0.0000000 0.0000000
medv    0.0000000 0.5043192 0.2069133 0.0000000
> nsprcomp(MASS::Boston, k = 3, ncomp = 5, nneg = TRUE, scale. = TRUE, partial_model = nspc)
Standard deviations (1, .., p=5):
[1] 2.1308856 1.5498222 1.0141589 1.0110895 0.9012304

Rotation (n x k) = (14 x 5):
              PC1       PC2       PC3       PC4       PC5
crim    0.3127134 0.0000000 0.0000000 0.0000000 0.0000000
zn      0.0000000 0.4850296 0.0000000 0.0000000 0.0000000
indus   0.3989959 0.0000000 0.0000000 0.0000000 0.0000000
chas    0.0000000 0.0000000 0.9783593 0.0000000 0.0000000
nox     0.4025259 0.0000000 0.0000000 0.0000000 0.0000000
rm      0.0000000 0.4519551 0.0000000 0.0000000 0.0000000
age     0.3572843 0.0000000 0.0000000 0.0000000 0.3385724
dis     0.0000000 0.4529891 0.0000000 0.2019432 0.0000000
rad     0.3936984 0.0000000 0.0000000 0.0000000 0.0000000
tax     0.4155276 0.0000000 0.0000000 0.0000000 0.0000000
ptratio 0.0000000 0.0000000 0.0000000 0.9793972 0.0000000
black   0.0000000 0.3177198 0.0000000 0.0000000 0.9025057
lstat   0.3545021 0.0000000 0.0000000 0.0000000 0.2661807
medv    0.0000000 0.5043192 0.2069133 0.0000000 0.0000000
> set.seed(1)
> X <- matrix(runif(5 * 10), 5)
> nspc <- nsprcomp(X, omega = c(5, 1, 1, 1, 5), ncomp = 2, nneg = TRUE)
> X_hat <- predict(nspc) %*% MASS::ginv(nspc$rotation) + matrix(1, 5, 1) %*% 
+     nspc$center
> rowSums((X - X_hat)^2)
[1] 0.22883035 0.57480275 0.32494491 0.65648697 0.07084008