estimate.eig.vpcs: Estimate the variance percentages for uncalculated...

Description Usage Arguments Value See Also Examples

View source: R/bigpca.R

Description

If using a function like irlba() to calculate PCA, then you can choose (for speed) to only calculate a subset of the eigenvalues. So there is no exact percentage of variance explained by the PCA, or by each component as you will get as output from other routines. This code uses a linear, or b*1/x model, to estimate the AUC for the unknown eigenvalues, providing a reasonable estimate of the variances accounted for by each unknown eigenvalue, and the predicted eigenvalue sum of the unknown eigenvalues.

Usage

1
2
3
4
estimate.eig.vpcs(eigenv = NULL, min.dim = length(eigenv), M = NULL,
  elbow = NA, linear = TRUE, estimated = FALSE, print.est = TRUE,
  print.coef = FALSE, add.fit.line = FALSE, col = "blue",
  ignore.warn = FALSE)

Arguments

eigenv

the vector of eigenvalues actually calculated

min.dim

the size of the smaller dimension of the matrix submitted to singular value decomposition, e.g, number of samples - i.e, the max number of possible eigenvalues, alternatively use 'M'.

M

optional enter the original dataset 'M'; simply used to derive the dimensions, alternatively use 'min.dim'.

elbow

the number of components which you think explain the important portion of the variance of the dataset, so further components are assumed to be reflecting noise or very subtle effects, e.g, often the number of components used is decided by the 'elbow' in a scree plot (see 'pca.scree.plot')

linear

whether to use a linear model to model the 'noise' eigenvalues; alternative is a 1/x model with no intercept.

estimated

logical, whether to return the estimated variance percentages for unobserved eigenvalues along with the real data; will also generate a factor describing which values in the returned vector are observed versus estimated.

print.est

whether to output the estimate result to the console

print.coef

whether to output the estimate regression coefficients to the console

add.fit.line

logical, if there is an existing scree plot, adds the fit line from this estimate to the plot ('pca.scree.plot' can use this option using the parameter of the same name)

col

colour for the fit line

ignore.warn

ignore warnings when an estimate is not required (i.e, all eigenvalues present)

Value

By default returns a list where the first element ā€¯variance.pcs' are the known variance percentages for each eigenvalue based on the estimated divisor, the second element 'tail.auc' is the area under the curve for the estimated eigenvalues. If estimate =TRUE then a third element is return with separate variance percentages for each of the estimated eigenvalues.

See Also

pca.scree.plot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
nsamp <- 100; nvar <- 300; subset.size <- 25; elbow <- 6
mat <- matrix(rnorm(nsamp*nvar),ncol=nsamp) 
# or use: # mat <- crimtab-rowMeans(crimtab) ; subset.size <- 10 # crimtab centred
prv.large(mat)
pca <- svd(mat,nv=subset.size,nu=0) # calculates subset of V, but all D
require(irlba)
pca2 <- irlba(mat,nv=subset.size,nu=0) # calculates subset of V & D
pca3 <- princomp(mat,cor=TRUE) # calculates all
# number of eigenvalues for svd is the smaller dimension of the matrix
eig.varpc <- estimate.eig.vpcs(pca$d^2,M=mat)$variance.pcs
cat("sum of all eigenvalue-variances=",sum(eig.varpc),"\n")
print(eig.varpc[1:elbow])
# number of eigenvalues for irlba is the size of the subset if < min(dim(M))
eig.varpc <- estimate.eig.vpcs((pca2$d^2)[1:subset.size],M=mat)$variance.pcs
print(eig.varpc[1:elbow])  ## using 1/x model, underestimates total variance
eig.varpc <- estimate.eig.vpcs((pca2$d^2)[1:subset.size],M=mat,linear=TRUE)$variance.pcs
print(eig.varpc[1:elbow])  ## using linear model, closer to exact answer
eig.varpc <- estimate.eig.vpcs((pca3$sdev^2),M=mat)$variance.pcs
print(eig.varpc[1:elbow])  ## different analysis, but fairly similar var.pcs

Example output

Loading required package: reader
Loading required package: NCmisc

Attaching package: 'reader'

The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext

Loading required package: bigmemory
Loading required package: biganalytics
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
Warning messages:
1: replacing previous import 'reader::cat.path' by 'NCmisc::cat.path' when loading 'bigpca' 
2: replacing previous import 'reader::get.ext' by 'NCmisc::get.ext' when loading 'bigpca' 
3: replacing previous import 'reader::rmv.ext' by 'NCmisc::rmv.ext' when loading 'bigpca' 

      col# 
 row#        1        2  .....       100 
    1   -0.091  -0.3784   ...     1.3543 
    2  -0.5153   -0.148   ...     1.8725 
    3   -0.503   0.7059   ...     0.4792 
  ...      ...      ...   ...        ... 
  300  -2.0728  -1.3992   ...    -0.9845 
Loading required package: irlba
Loading required package: Matrix
All eigenvalues present, estimate not required
sum of all eigenvalue-variances= 1 
[1] 0.02390282 0.02373479 0.02255394 0.02214702 0.02172337 0.02091343
 estimate of eigenvalue sum of 75 uncalculated eigenvalues: 7356.386 
[1] 0.03391549 0.03367708 0.03200158 0.03142422 0.03082309 0.02967388
 estimate of eigenvalue sum of 75 uncalculated eigenvalues: 7356.386 
[1] 0.03391549 0.03367708 0.03200158 0.03142422 0.03082309 0.02967388
All eigenvalues present, estimate not required
    Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6 
0.02413550 0.02365197 0.02295332 0.02195104 0.02164209 0.02071138 

bigpca documentation built on Nov. 22, 2017, 1:02 a.m.