# estimate.eig.vpcs: Estimate the variance percentages for uncalculated... In bigpca: PCA, Transpose and Multicore Functionality for 'big.matrix' Objects

## Description

If using a function like irlba() to calculate PCA, then you can choose (for speed) to only calculate a subset of the eigenvalues. So there is no exact percentage of variance explained by the PCA, or by each component as you will get as output from other routines. This code uses a linear, or b*1/x model, to estimate the AUC for the unknown eigenvalues, providing a reasonable estimate of the variances accounted for by each unknown eigenvalue, and the predicted eigenvalue sum of the unknown eigenvalues.

## Usage

 ```1 2 3 4``` ```estimate.eig.vpcs(eigenv = NULL, min.dim = length(eigenv), M = NULL, elbow = NA, linear = TRUE, estimated = FALSE, print.est = TRUE, print.coef = FALSE, add.fit.line = FALSE, col = "blue", ignore.warn = FALSE) ```

## Arguments

 `eigenv` the vector of eigenvalues actually calculated `min.dim` the size of the smaller dimension of the matrix submitted to singular value decomposition, e.g, number of samples - i.e, the max number of possible eigenvalues, alternatively use 'M'. `M` optional enter the original dataset 'M'; simply used to derive the dimensions, alternatively use 'min.dim'. `elbow` the number of components which you think explain the important portion of the variance of the dataset, so further components are assumed to be reflecting noise or very subtle effects, e.g, often the number of components used is decided by the 'elbow' in a scree plot (see 'pca.scree.plot') `linear` whether to use a linear model to model the 'noise' eigenvalues; alternative is a 1/x model with no intercept. `estimated` logical, whether to return the estimated variance percentages for unobserved eigenvalues along with the real data; will also generate a factor describing which values in the returned vector are observed versus estimated. `print.est` whether to output the estimate result to the console `print.coef` whether to output the estimate regression coefficients to the console `add.fit.line` logical, if there is an existing scree plot, adds the fit line from this estimate to the plot ('pca.scree.plot' can use this option using the parameter of the same name) `col` colour for the fit line `ignore.warn` ignore warnings when an estimate is not required (i.e, all eigenvalues present)

## Value

By default returns a list where the first element ā€¯variance.pcs' are the known variance percentages for each eigenvalue based on the estimated divisor, the second element 'tail.auc' is the area under the curve for the estimated eigenvalues. If estimate =TRUE then a third element is return with separate variance percentages for each of the estimated eigenvalues.

`pca.scree.plot`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```nsamp <- 100; nvar <- 300; subset.size <- 25; elbow <- 6 mat <- matrix(rnorm(nsamp*nvar),ncol=nsamp) # or use: # mat <- crimtab-rowMeans(crimtab) ; subset.size <- 10 # crimtab centred prv.large(mat) pca <- svd(mat,nv=subset.size,nu=0) # calculates subset of V, but all D require(irlba) pca2 <- irlba(mat,nv=subset.size,nu=0) # calculates subset of V & D pca3 <- princomp(mat,cor=TRUE) # calculates all # number of eigenvalues for svd is the smaller dimension of the matrix eig.varpc <- estimate.eig.vpcs(pca\$d^2,M=mat)\$variance.pcs cat("sum of all eigenvalue-variances=",sum(eig.varpc),"\n") print(eig.varpc[1:elbow]) # number of eigenvalues for irlba is the size of the subset if < min(dim(M)) eig.varpc <- estimate.eig.vpcs((pca2\$d^2)[1:subset.size],M=mat)\$variance.pcs print(eig.varpc[1:elbow]) ## using 1/x model, underestimates total variance eig.varpc <- estimate.eig.vpcs((pca2\$d^2)[1:subset.size],M=mat,linear=TRUE)\$variance.pcs print(eig.varpc[1:elbow]) ## using linear model, closer to exact answer eig.varpc <- estimate.eig.vpcs((pca3\$sdev^2),M=mat)\$variance.pcs print(eig.varpc[1:elbow]) ## different analysis, but fairly similar var.pcs ```

### Example output

```Loading required package: reader

The following objects are masked from 'package:NCmisc':

cat.path, get.ext, rmv.ext

Warning messages:

col#
row#        1        2  .....       100
1   -0.091  -0.3784   ...     1.3543
2  -0.5153   -0.148   ...     1.8725
3   -0.503   0.7059   ...     0.4792
...      ...      ...   ...        ...
300  -2.0728  -1.3992   ...    -0.9845
All eigenvalues present, estimate not required
sum of all eigenvalue-variances= 1
[1] 0.02390282 0.02373479 0.02255394 0.02214702 0.02172337 0.02091343
estimate of eigenvalue sum of 75 uncalculated eigenvalues: 7356.386
[1] 0.03391549 0.03367708 0.03200158 0.03142422 0.03082309 0.02967388
estimate of eigenvalue sum of 75 uncalculated eigenvalues: 7356.386
[1] 0.03391549 0.03367708 0.03200158 0.03142422 0.03082309 0.02967388
All eigenvalues present, estimate not required
Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6
0.02413550 0.02365197 0.02295332 0.02195104 0.02164209 0.02071138
```

bigpca documentation built on Nov. 22, 2017, 1:02 a.m.