Description Usage Arguments Details Value References Examples

View source: R/bootstrap_functions.R

Applies fast bootstrap PCA, using the method from (Fisher et al., 2014). Dimension of the sample is denoted by *p*, and sample size is denoted by *n*, with *p>n*.

1 2 3 4 |

`Y` |
initial data sample, which can be either a matrix or a |

`K` |
number of PCs to calculate the bootstrap distribution for. |

`V` |
(optional) the ( |

`d` |
(optional) |

`U` |
(optional) the ( |

`B` |
number of bootstrap samples to compute. |

`output` |
a vector telling which descriptions of the bootstrap distribution should be calculated. Can include any of the following: 'initial_SVD', 'HD_moments', 'full_HD_PC_dist', and 'HD_percentiles'. See below for explanations of these outputs. |

`verbose` |
if TRUE, the function will print progress during calculation procedure. |

`bInds` |
a ( |

`percentiles` |
a vector containing percentiles to be used to calculate element-wise percentiles across the bootstrap distribution (both across the distribution of |

`centerSamples` |
whether each bootstrap sample should be centered before calculating the SVD. |

`pattern_V` |
if |

`pattern_Vb` |
if |

Users might also consider changing the global options `ffbatchbytes`

, from the `ff`

package, and `mc.cores`

, from the `parallel`

package. When `ff`

objects are entered as arguments for `bootSVD`

, the required matrix algebra is done using block matrix alebra. The `ffbatchbytes`

option determines the size of the largest block matrix that will be held in memory at any one time. The `mc.cores`

option (set to 1 by default) determines the level of parallelization to use when calculating the high dimensional distribution of the bootstrap PCs (see `mclapply`

).

`bootSVD`

returns a list that can include any of the following elements, depending on what is specified in the `output`

argument:

- initial_SVD
The singular value decomposition of the centered, original data matrix.

`initial_SVD`

is a list containing`V`

, the matrix of*p*-dimensional principal components,`d`

, the vector of singular values of`Y`

, and`U`

, the matrix of*n*-dimensional singular vectors of`Y`

.- HD_moments
A list containing the bootstrap expected value (

`EPCs`

), element-wise bootstrap variance (`varPCs`

), and element-wise bootstrap standard deviation (`sdPCs`

) for each of the*p*-dimensional PCs. Each of these three elements of`HD_moments`

is also a list, which contains*K*vectors, one for each PC.`HD_moments`

also contains`momentCI`

, a*K*-length list of (*p*by 2) matrices containing element-wise moment based confidence intervals for the PCs.- full_HD_PC_dist
A

*B*-length list of matrices (or`ff`

matrices), with the*b^{th}*list element equal to the (*p*by*K*) matrix of high dimensional PCs for the*b^{th}*bootstrap sample.

For especially high dimensional cases when the output is returned as`ff`

matrices, caution should be used if requesting 'full_HD_PC_dist' due to potential storage limitations.

To reindex these PCs by*k*(the PC index) as opposed to*b*(the bootstrap index), see`reindexMatricesByK`

. Again though, caution shoulded be used when reindexing PCs stored as`ff`

objects, as this will double the number of files stored.- HD_percentiles
A list of

*K*matrices, each of dimension (*p*by*q*), where*q*is the number of percentiles requested (i.e.*q*=`length(percentiles)`

). The*k^{th}*matrix in`HD_percentiles`

contains element-wise percentiles for the*k^{th}*,*p*-dimensional PC.

In addition, the following results are always included in the output, regardless of what is specified in the `output`

argument:

`full_LD_PC_dist` |
A |

`d_dist` |
A |

`U_dist` |
A |

`LD_moments` |
A list that is comparable to |

`LD_percentiles` |
A list of |

Aaron Fisher, Brian Caffo, and Vadim Zipunnikov. *Fast, Exact Bootstrap Principal Component Analysis for p>1 million*. 2014. http://arxiv.org/abs/1405.0922

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | ```
#use small n, small B, for a quick illustration
set.seed(0)
Y<-simEEG(n=100, centered=TRUE, wide=TRUE)
b<-bootSVD(Y, B=50, K=2, output=
c('initial_SVD', 'HD_moments', 'full_HD_PC_dist',
'HD_percentiles'), verbose=interactive())
b
#explore results
matplot(b$initial_SVD$V[,1:4],type='l',main='Fitted PCs',lty=1)
legend('bottomright',paste0('PC',1:4),col=1:4,lty=1,lwd=2)
######################
# look specifically at 2nd PC
k<-2
######
#looking at HD variability
#plot several draws from bootstrap distribution
VsByK<-reindexMatricesByK(b$full_HD_PC_dist)
matplot(t(VsByK[[k]][1:20,]),type='l',lty=1,
main=paste0('20 Draws from bootstrap\ndistribution of HD PC ',k))
#plot pointwise CIs
matplot(b$HD_moments$momentCI[[k]],type='l',col='blue',lty=1,
main=paste0('CIs For HD PC ',k))
matlines(b$HD_percentiles[[k]],type='l',col='darkgreen',lty=1)
lines(b$initial_SVD$V[,k])
legend('topright',c('Fitted PC','Moment CIs','Percentile CIs'),
lty=1,col=c('black','blue','darkgreen'))
abline(h=0,lty=2,col='darkgrey')
######
# looking at LD variability
# plot several draws from bootstrap distribution
AsByK<-reindexMatricesByK(b$full_LD_PC_dist)
matplot(t(AsByK[[k]][1:50,]),type='l',lty=1,
main=paste0('50 Draws from bootstrap\ndistribution of LD PC ',k),
xlim=c(1,10),xlab='PC index (truncated)')
# plot pointwise CIs
matplot(b$LD_moments$momentCI[[k]],type='o',col='blue',
lty=1,main=paste0('CIs For LD PC ',k),xlim=c(1,10),
xlab='PC index (truncated)',pch=1)
matlines(b$LD_percentiles[[k]],type='o',pch=1,col='darkgreen',lty=1)
abline(h=0,lty=2,col='darkgrey')
legend('topright',c('Moment CIs','Percentile CIs'),lty=1,
pch=1,col=c('blue','darkgreen'))
#Note: variability is mostly due to rotations with the third and fourth PC.
# Bootstrap eigenvalue distribution
dsByK<-reindexVectorsByK(b$d_dist)
boxplot(dsByK[[k]]^2,main=paste0('Covariance Matrix Eigenvalue ',k),
ylab='Bootstrap Distribution',
ylim=range(c(dsByK[[k]]^2,b$initial_SVD$d[k]^2)))
points(b$initial_SVD$d[k]^2,pch=18,col='red')
legend('bottomright','Sample Value',pch=18,col='red')
##################
#Example with ff input
library(ff)
Yff<-as.ff(Y, pattern='Y_')
# If desired, change options in 'ff' package to
# adjust the size of matrix blocks held in RAM.
# For example:
# options('ffbatchbytes'=100000)
ff_dir<-tempdir()
pattern_V <- paste0(ff_dir,'/V_')
pattern_Vb <- paste0(ff_dir,'/Vb_')
bff <- bootSVD(Yff, B=50, K=2, output=c('initial_SVD', 'HD_moments',
'full_HD_PC_dist', 'HD_percentiles'), pattern_V= pattern_V,
pattern_Vb=pattern_Vb, verbose=interactive())
# Note that elements of full_HD_PC_dist and initial_SVD
# have class 'ff'
str(lapply(bff,function(x) class(x[[1]])))
#Show some results of bootstrap draws
plot(bff$full_HD_PC_dist[[1]][,k],type='l')
#Reindexing by K will create a new set of ff files.
VsByKff<-reindexMatricesByK(bff$full_HD_PC_dist,
pattern=paste0(ff_dir,'/Vk_'))
physical(bff$full_HD_PC_dist[[1]])$filename
physical(VsByKff[[1]])$filename
matplot(t(VsByKff[[k]][1:10,]),type='l',lty=1,
main=paste0('Bootstrap Distribution of PC',k))
# Saving and moving results:
saveRDS(bff,file=paste0(ff_dir,'/bff.rds'))
close(bff$initial_SVD$V)
physical(bff$initial_SVD$V)$filename
# If the 'ff' files on disk are moved or renamed,
# this filename attribute can be changed:
old_ff_path <- physical(bff$initial_SVD$V)$filename
new_ff_path <- paste0(tempdir(),'/new_V_file.ff')
file.rename(from= old_ff_path, to= new_ff_path)
physical(bff$initial_SVD$V)$filename <- new_ff_path
matplot(bff$initial_SVD$V[,1:4],type='l',lty=1)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.