Fast correlation for large matrices
Description
fastCor
is a helper function that compute Pearson correlation matrix
for HiClimR
and validClimR
functions. It is similar
to cor
function in R but uses a faster implementation on 64bit
machines (an optimized BLAS
library is highly recommended). fastCor
also uses a memoryefficient algorithm that allows for splitting the data matrix and
only compute the uppertriangular part of the correlation matrix. It can be used to
compute correlation matrix for the columns of any data matrix.
Usage
1 
Arguments
xt 
an ( 
nSplit 
integer number greater than or equal to one, to split the data matrix into

upperTri 
logical to compute only the uppertriangular half of the correlation
matrix if 
verbose 
logical to print processing information if 
Details
The fastCor
function computes the correlation matrix by
calling the cross product function in the Basic Linear Algebra Subroutines
(BLAS) library used by R. A significant performance improvement can be
achieved when building R on 64bit machines with an optimized BLAS library,
such as ATLAS, OpenBLAS, or the commercial Intel MKL.
For big data, the memory required to allocate the square matrix of correlations
may exceed the total amount of physical memory available resulting in
“Error: cannot allocate vector of size...”. fastCor
allows
for splitting the data matrix into nSplit
splits and only computes the
uppertriangular part of the correlation matrix with upperTri = TRUE
.
This almost halves memory use, which can be very important for big data.
If nSplit > 1
, the correlation matrix (or the uppertriangular part if
upperTri = TRUE
) will be allocated and filled with computed correlation
submatrix for each split. the first n1
splits have equal size while
the last split may include any remaining columns.
Value
An (N
rows by N
columns) correlation matrix.
Author(s)
Hamada Badr <badr@jhu.edu>, Ben Zaitchik <zaitchik@jhu.edu>, and Amin Dezfuli <dez@jhu.edu>.
References
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2015): A Tool for Hierarchical Climate Regionalization, Earth Science Informatics, 110, http://dx.doi.org/10.1007/s1214501502217.
Hamada S. Badr, Zaitchik, B. F. and Dezfuli, A. K. (2014): Hierarchical Climate Regionalization, CRAN, http://cran.rproject.org/package=HiClimR.
bigcor: Large correlation matrices in R, https://rmazing.wordpress.com.
See Also
HiClimR
, validClimR
, geogMask
,
coarseR
, fastCor
, grid2D
, and
minSigCor
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  require(HiClimR)
## Load test case data
x < TestCase$x
## Use fastCor function to compute the correlation matrix
t0 < proc.time() ; xcor < fastCor(t(x)) ; proc.time()  t0
## compare with cor function
t0 < proc.time() ; xcor0 < cor(t(x)) ; proc.time()  t0
## Not run:
## Split the data into 10 splits and return uppertriangular half only
xcor10 < fastCor(t(x), nSplit = 10, upperTri = TRUE)
## End(Not run)
