bigpca-package: PCA, Transpose and Multicore Functionality for 'big.matrix'...

Description Details Author(s) See Also Examples

Description

This package adds wrappers to add functionality for big.matrix objects (see the bigmemory project). This allows fast scalable principle components analysis (PCA), or singular value decomposition (SVD). There are also functions for transposing, using multicore 'apply' functionality, data importing, and for compact display of big.matrix objects. Most functions also work for standard matrices if RAM is sufficient.

Details

Package: bigpca
Type: Package
Version: 1.1
Date: 2017-11-17
License: GPL (>= 2)

The bigmemory project has provided a useful new data structure 'big.matrix', which allows fast and efficient access to an object that is only limited by disk-space and not RAM capacity. This package provides wrappers to extend the library of functions available for big.matrix objects. The focus of this package are functions for multicore functionality and Principle Components Analysis (PCA)/Singular Value Decomposition (SVD). bmcapply() works similarly to mcapply but is for big.matrix objects. There is a transpose function (which is not super-fast, but can be run with multiple cores to improve speed). There are several functions dedicated to PCA/SVD. These operations still require a large amount of RAM for large matrices, but the speed is greatly increased and there are useful tools allowing PCA/SVD of much larger matrices than would be feasible otherwise. There are also functions for determining the 'elbow' of the data, making scree plots, estimating variance explained for incomplete sets of eigenvalues, and for using the derived principle components for correction of a dataset. The PC correction algorithm is fast and can be run with multiple cores simultaneously. There is also a new function prv.big.matrix() for compactly previewing large matrices, and get.big.matrix() for flexibly retrieving a big.matrix object from a range of different formats.

List of key functions:

Author(s)

Nicholas Cooper

Maintainer: Nicholas Cooper <njcooper@gmx.co.uk>

See Also

NCmisc ~~

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#' # create a test big.matrix object (file-backed)
#' orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
#' bM <- filebacked.big.matrix(20, 50,
#'        dimnames = list(paste("r",1:20,sep=""), paste("c",1:50,sep="")),
#'        backingfile = "test.bck",  backingpath = getwd(), descriptorfile = "test.dsc")
#' bM[1:20,] <- replicate(50,rnorm(20))
#' prv.big.matrix(bM)
#' # now transpose
#' tbM <- big.t(bM,dir=getwd(),verbose=T)
#' prv.big.matrix(tbM,row=10,col=4)
#' colSDs <- bmcapply(tbM,2,sd,n.cores=10)
#' rowSDs <- bmcapply(bM,1,sd,n.cores=10) # use up to 10 cores if available
#' ##  generate some data with reasonable intercorrelations ##
#' mat <- sim.cor(500,200,genr=function(n){ (runif(n)/2+.5) })
#' bmat <- as.big.matrix(mat)
#' # calculate PCA 
#' result <- big.PCA(bmat)
#' corrected <- PC.correct(result2,bmat)
#' corrected2 <- PC.correct(result2,bmat,n.cores=5)
#' all.equal(corrected,corrected2)
#' rm(tbM); rm(bM);rm(result); 
#' rm(corrected);rm(corrected2); rm(bmat)
#' clear_active_bms() # delete big.matrix objects in memory
#' unlink(c("test.bck","test.dsc"))
#' setwd(orig.dir)

Example output

Loading required package: reader
Loading required package: NCmisc
Loading required package: bigmemory
Loading required package: bigmemory.sri
Loading required package: biganalytics
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI

bigpca documentation built on Nov. 22, 2017, 1:02 a.m.