Description Usage Arguments Details Value References Examples
View source: R/bootstrap_functions.R
Applies fast bootstrap PCA, using the method from (Fisher et al., 2014). Dimension of the sample is denoted by p, and sample size is denoted by n, with p>n.
1 2 3 4 |
Y |
initial data sample, which can be either a matrix or a |
K |
number of PCs to calculate the bootstrap distribution for. |
V |
(optional) the (p by n) full matrix of p-dimensional PCs for the sample data matrix. If |
d |
(optional) n-length vector of the singular values of |
U |
(optional) the (n by n) full set of n-dimensional singular vectors of |
B |
number of bootstrap samples to compute. |
output |
a vector telling which descriptions of the bootstrap distribution should be calculated. Can include any of the following: 'initial_SVD', 'HD_moments', 'full_HD_PC_dist', and 'HD_percentiles'. See below for explanations of these outputs. |
verbose |
if TRUE, the function will print progress during calculation procedure. |
bInds |
a (B by n) matrix of bootstrap indeces, where |
percentiles |
a vector containing percentiles to be used to calculate element-wise percentiles across the bootstrap distribution (both across the distribution of p-dimensional components and the distribution of n-dimensional components). For example, |
centerSamples |
whether each bootstrap sample should be centered before calculating the SVD. |
pattern_V |
if |
pattern_Vb |
if |
Users might also consider changing the global options ffbatchbytes
, from the ff
package, and mc.cores
, from the parallel
package. When ff
objects are entered as arguments for bootSVD
, the required matrix algebra is done using block matrix alebra. The ffbatchbytes
option determines the size of the largest block matrix that will be held in memory at any one time. The mc.cores
option (set to 1 by default) determines the level of parallelization to use when calculating the high dimensional distribution of the bootstrap PCs (see mclapply
).
bootSVD
returns a list that can include any of the following elements, depending on what is specified in the output
argument:
The singular value decomposition of the centered, original data matrix. initial_SVD
is a list containing V
, the matrix of p-dimensional principal components, d
, the vector of singular values of Y
, and U
, the matrix of n-dimensional singular vectors of Y
.
A list containing the bootstrap expected value (EPCs
), element-wise bootstrap variance (varPCs
), and element-wise bootstrap standard deviation (sdPCs
) for each of the p-dimensional PCs. Each of these three elements of HD_moments
is also a list, which contains K vectors, one for each PC. HD_moments
also contains momentCI
, a K-length list of (p by 2) matrices containing element-wise moment based confidence intervals for the PCs.
A B-length list of matrices (or ff
matrices), with the b^{th} list element equal to the (p by K) matrix of high dimensional PCs for the b^{th} bootstrap sample.
For especially high dimensional cases when the output is returned as ff
matrices, caution should be used if requesting 'full_HD_PC_dist' due to potential storage limitations.
To reindex these PCs by k (the PC index) as opposed to b (the bootstrap index), see reindexMatricesByK
. Again though, caution shoulded be used when reindexing PCs stored as ff
objects, as this will double the number of files stored.
A list of K matrices, each of dimension (p by q), where q is the number of percentiles requested (i.e. q = length(percentiles)
). The k^{th} matrix in HD_percentiles
contains element-wise percentiles for the k^{th}, p-dimensional PC.
In addition, the following results are always included in the output, regardless of what is specified in the output
argument:
full_LD_PC_dist |
A B-length list of matrices, with the b^{th} list element equal to the (p by K) matrix of PCs of the scores in the b^{th} bootstrap sample. To reindex these vectors by k (the PC index), see |
d_dist |
A |
U_dist |
A |
LD_moments |
A list that is comparable to |
LD_percentiles |
A list of K matrices, each of dimension (p by q), where q is the number of percentiles requested (i.e. q = |
Aaron Fisher, Brian Caffo, and Vadim Zipunnikov. Fast, Exact Bootstrap Principal Component Analysis for p>1 million. 2014. http://arxiv.org/abs/1405.0922
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | #use small n, small B, for a quick illustration
set.seed(0)
Y<-simEEG(n=100, centered=TRUE, wide=TRUE)
b<-bootSVD(Y, B=50, K=2, output=
c('initial_SVD', 'HD_moments', 'full_HD_PC_dist',
'HD_percentiles'), verbose=interactive())
b
#explore results
matplot(b$initial_SVD$V[,1:4],type='l',main='Fitted PCs',lty=1)
legend('bottomright',paste0('PC',1:4),col=1:4,lty=1,lwd=2)
######################
# look specifically at 2nd PC
k<-2
######
#looking at HD variability
#plot several draws from bootstrap distribution
VsByK<-reindexMatricesByK(b$full_HD_PC_dist)
matplot(t(VsByK[[k]][1:20,]),type='l',lty=1,
main=paste0('20 Draws from bootstrap\ndistribution of HD PC ',k))
#plot pointwise CIs
matplot(b$HD_moments$momentCI[[k]],type='l',col='blue',lty=1,
main=paste0('CIs For HD PC ',k))
matlines(b$HD_percentiles[[k]],type='l',col='darkgreen',lty=1)
lines(b$initial_SVD$V[,k])
legend('topright',c('Fitted PC','Moment CIs','Percentile CIs'),
lty=1,col=c('black','blue','darkgreen'))
abline(h=0,lty=2,col='darkgrey')
######
# looking at LD variability
# plot several draws from bootstrap distribution
AsByK<-reindexMatricesByK(b$full_LD_PC_dist)
matplot(t(AsByK[[k]][1:50,]),type='l',lty=1,
main=paste0('50 Draws from bootstrap\ndistribution of LD PC ',k),
xlim=c(1,10),xlab='PC index (truncated)')
# plot pointwise CIs
matplot(b$LD_moments$momentCI[[k]],type='o',col='blue',
lty=1,main=paste0('CIs For LD PC ',k),xlim=c(1,10),
xlab='PC index (truncated)',pch=1)
matlines(b$LD_percentiles[[k]],type='o',pch=1,col='darkgreen',lty=1)
abline(h=0,lty=2,col='darkgrey')
legend('topright',c('Moment CIs','Percentile CIs'),lty=1,
pch=1,col=c('blue','darkgreen'))
#Note: variability is mostly due to rotations with the third and fourth PC.
# Bootstrap eigenvalue distribution
dsByK<-reindexVectorsByK(b$d_dist)
boxplot(dsByK[[k]]^2,main=paste0('Covariance Matrix Eigenvalue ',k),
ylab='Bootstrap Distribution',
ylim=range(c(dsByK[[k]]^2,b$initial_SVD$d[k]^2)))
points(b$initial_SVD$d[k]^2,pch=18,col='red')
legend('bottomright','Sample Value',pch=18,col='red')
##################
#Example with ff input
library(ff)
Yff<-as.ff(Y, pattern='Y_')
# If desired, change options in 'ff' package to
# adjust the size of matrix blocks held in RAM.
# For example:
# options('ffbatchbytes'=100000)
ff_dir<-tempdir()
pattern_V <- paste0(ff_dir,'/V_')
pattern_Vb <- paste0(ff_dir,'/Vb_')
bff <- bootSVD(Yff, B=50, K=2, output=c('initial_SVD', 'HD_moments',
'full_HD_PC_dist', 'HD_percentiles'), pattern_V= pattern_V,
pattern_Vb=pattern_Vb, verbose=interactive())
# Note that elements of full_HD_PC_dist and initial_SVD
# have class 'ff'
str(lapply(bff,function(x) class(x[[1]])))
#Show some results of bootstrap draws
plot(bff$full_HD_PC_dist[[1]][,k],type='l')
#Reindexing by K will create a new set of ff files.
VsByKff<-reindexMatricesByK(bff$full_HD_PC_dist,
pattern=paste0(ff_dir,'/Vk_'))
physical(bff$full_HD_PC_dist[[1]])$filename
physical(VsByKff[[1]])$filename
matplot(t(VsByKff[[k]][1:10,]),type='l',lty=1,
main=paste0('Bootstrap Distribution of PC',k))
# Saving and moving results:
saveRDS(bff,file=paste0(ff_dir,'/bff.rds'))
close(bff$initial_SVD$V)
physical(bff$initial_SVD$V)$filename
# If the 'ff' files on disk are moved or renamed,
# this filename attribute can be changed:
old_ff_path <- physical(bff$initial_SVD$V)$filename
new_ff_path <- paste0(tempdir(),'/new_V_file.ff')
file.rename(from= old_ff_path, to= new_ff_path)
physical(bff$initial_SVD$V)$filename <- new_ff_path
matplot(bff$initial_SVD$V[,1:4],type='l',lty=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.