Description Details Note Author(s) References See Also Examples
Create, store, access, and manipulate massive matrices. Matrices are,
by default,
allocated to shared memory and may use memory-mapped files. Packages
biganalytics, synchronicity, bigalgebra, and bigtabulate provide
advanced functionality. Access to and manipulation of a big.matrix
object is exposed in R by an S4 class whose interface is simlar
to that of an R matrix
. Use of these packages in parallel environments
can provide substantial speed and memory efficiencies. bigmemory also
provides a C++ framework for the development of new tools that can
work both with big.matrix
and native R matrix
objects.
Package: | bigmemory |
Type: | Package |
Version: | 4.4.14 |
Date: | 2014-09-04 |
License: | LGPL-3 | APv2 |
Copyright: | (C) 2014 Michael J. Kane and John W. Emerson |
URL: | http://www.bigmemory.org |
LazyLoad: | yes |
Index of functions/methods (grouped in a friendly way):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | big.matrix, filebacked.big.matrix, as.big.matrix
is.big.matrix, is.separated, is.filebacked
describe, attach.big.matrix, attach.resource
sub.big.matrix, is.sub.big.matrix
dim, dimnames, nrow, ncol, print, head, tail, typeof, length
read.big.matrix, write.big.matrix
mwhich
morder, mpermute
deepcopy
flush
|
Multi-gigabyte data sets challenge and frustrate R users, even on well-equipped hardware. Use of C/C++ can provide efficiencies, but is cumbersome for interactive data analysis and lacks the flexibility and power of R's rich statistical programming environment. The package bigmemory and sister packages biganalytics, synchronicity, bigtabulate, and bigalgebra bridge this gap, implementing massive matrices and supporting their manipulation and exploration. The data structures may be allocated to shared memory, allowing separate processes on the same computer to share access to a single copy of the data set. The data structures may also be file-backed, allowing users to easily manage and analyze data sets larger than available RAM and share them across nodes of a cluster. These features of the Bigmemory Project open the door for powerful and memory-efficient parallel analyses and data mining of massive data sets.
This project (bigmemory and its sister packages) is still actively developed, although the design and current features can be viewed as "stable." Please feel free to email us with any questions: bigmemoryauthors@gmail.com.
Various options are available. options(bigmemory.typecast.warning)
can be set to avoid annoying warnings that might occur if, for example, you
assign R objects (typically type double) to char, short, or integer
big.matrix
objects.
options(bigmemory.print.warning)
protects against extracting and
printing a massive
matrix (which would involve the creation of a second massive copy of the matrix).
options(bigmemory.allow.dimnames)
by default prevents the setting of
dimnames
attributes, because they aren't allocated to shared memory
and changes will not be visible across processes.
options(bigmemory.default.type)
is "double"
be default (a
change in default behavior as of 4.1.1) but may be changed by the user.
Versions >=4.0 represent a major redesign, with the mutexes (locking)
abstracted to package synchronicity, the exploratory data analysis
functionality relocated
to package biganalytics, and new linear algebra support available in
package bigalgebra.
Package bigtabulate extends the bigmemory package with table-,
tapply-, and split-like behavior. The functions may also be used
with regular R matrices for speed and memory-efficiency gains.
Package bigmemory itself is now minimalist,
providing only the core functionality. As an example, the apply()
method appears in biganalytics, supporting exploration and analysis,
while mwhich
, morder
and mpermute
appear in bigmemory as fundamental tools for data manipulation.
Versions <4.0 supported a limited number of columns (due to mutex limitations): roughly 50,000 on a typical Linux system. This restriction has been removed in versions >=4.0. There were row limitations (due to a bug that has now been fixed) in versions <=3.8 of roughly 1 billion, but this has been fixed in versions >=3.82.
Note that you can't simply use a big.matrix
with many (most) existing
R functions (e.g. lm
, kmeans
). One nice
exception is split
, because this function only accesses
subsets of the matrix.
Michael J. Kane and John W. Emerson
Maintainers: Michael J. Kane <bigmemoryauthors@gmail.com>
The Bigmemory Project: http://www.bigmemory.org/.
For example,
big.matrix
, mwhich
, read.big.matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.