Description Details Author(s) See Also Examples
This package adds wrappers to add functionality for big.matrix objects (see the bigmemory project). This allows fast scalable principle components analysis (PCA), or singular value decomposition (SVD). There are also functions for transposing, using multicore 'apply' functionality, data importing, and for compact display of big.matrix objects. Most functions also work for standard matrices if RAM is sufficient.
Package: | bigpca |
Type: | Package |
Version: | 1.1 |
Date: | 2017-11-17 |
License: | GPL (>= 2) |
The bigmemory project has provided a useful new data structure 'big.matrix', which allows fast and efficient access to an object that is only limited by disk-space and not RAM capacity. This package provides wrappers to extend the library of functions available for big.matrix objects. The focus of this package are functions for multicore functionality and Principle Components Analysis (PCA)/Singular Value Decomposition (SVD). bmcapply() works similarly to mcapply but is for big.matrix objects. There is a transpose function (which is not super-fast, but can be run with multiple cores to improve speed). There are several functions dedicated to PCA/SVD. These operations still require a large amount of RAM for large matrices, but the speed is greatly increased and there are useful tools allowing PCA/SVD of much larger matrices than would be feasible otherwise. There are also functions for determining the 'elbow' of the data, making scree plots, estimating variance explained for incomplete sets of eigenvalues, and for using the derived principle components for correction of a dataset. The PC correction algorithm is fast and can be run with multiple cores simultaneously. There is also a new function prv.big.matrix() for compactly previewing large matrices, and get.big.matrix() for flexibly retrieving a big.matrix object from a range of different formats.
List of key functions:
big.algebra.install.help install the big algebra package, or provide tips if it fails
big.PCA PCA or SVD of a big.matrix object
big.select select a subset of a big.matrix
bmcapply multicore apply function for big.matrix
estimate.eig.vpcs estimate uncalculated eigenvalues
generate.test.matrix easily generate a random dataset for testing/simulation
get.big.matrix obtain a big.matrix object via several possible methods
import.big.data import data from text files efficiently into a big.matrix
PC.correct correct a dataset (big.matrix) for n principle components
pca.scree.plot draw a scree plot for a PCA / SVD
prv.big.matrix compact preview for big.matrix objects
quick.elbow calculate the elbow of a scree plot
quick.pheno.assocs simple phenotype association test
select.least.assoc choose subset of big.matrix variables least associated with a phenotype
subcor.select choose a subset of a big.matrix that is most/least correlated with other variables
subpc.select choose a subset of a big.matrix that is most representative of the principle components
svn.bigalgebra.install install the big algebra package from SVN if command is available
big.t transpose function for big.matrix (can be multicore)
thin reduce the size of a big.matrix whilst preserving important data relationships
uniform.select select a random or uniform subset of a big.matrix
Nicholas Cooper
Maintainer: Nicholas Cooper <njcooper@gmx.co.uk>
NCmisc
~~
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | #' # create a test big.matrix object (file-backed)
#' orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
#' bM <- filebacked.big.matrix(20, 50,
#' dimnames = list(paste("r",1:20,sep=""), paste("c",1:50,sep="")),
#' backingfile = "test.bck", backingpath = getwd(), descriptorfile = "test.dsc")
#' bM[1:20,] <- replicate(50,rnorm(20))
#' prv.big.matrix(bM)
#' # now transpose
#' tbM <- big.t(bM,dir=getwd(),verbose=T)
#' prv.big.matrix(tbM,row=10,col=4)
#' colSDs <- bmcapply(tbM,2,sd,n.cores=10)
#' rowSDs <- bmcapply(bM,1,sd,n.cores=10) # use up to 10 cores if available
#' ## generate some data with reasonable intercorrelations ##
#' mat <- sim.cor(500,200,genr=function(n){ (runif(n)/2+.5) })
#' bmat <- as.big.matrix(mat)
#' # calculate PCA
#' result <- big.PCA(bmat)
#' corrected <- PC.correct(result2,bmat)
#' corrected2 <- PC.correct(result2,bmat,n.cores=5)
#' all.equal(corrected,corrected2)
#' rm(tbM); rm(bM);rm(result);
#' rm(corrected);rm(corrected2); rm(bmat)
#' clear_active_bms() # delete big.matrix objects in memory
#' unlink(c("test.bck","test.dsc"))
#' setwd(orig.dir)
|
Loading required package: reader
Loading required package: NCmisc
Loading required package: bigmemory
Loading required package: bigmemory.sri
Loading required package: biganalytics
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.