R/hello-bigmemory.R

#' Manage massive matrices with shared memory and memory-mapped files.
#' 
#' Create, store, access, and manipulate massive matrices.  Matrices are, by
#' default, allocated to shared memory and may use memory-mapped files.
#' Packages \pkg{biganalytics}, \pkg{synchronicity}, \pkg{bigalgebra}, and
#' \pkg{bigtabulate} provide advanced functionality.  Access to and
#' manipulation of a \code{\link{big.matrix}} object is exposed in an S4
#' class whose interface is similar to that of a \code{\link{matrix}}.  Use of
#' these packages in parallel environments can provide substantial speed and
#' memory efficiencies.  \pkg{bigmemory} also provides a \acronym{C++}
#' framework for the development of new tools that can work both with
#' \code{big.matrix} and native \code{matrix} objects.
#' 
#' Index of functions/methods (grouped in a friendly way): \preformatted{
#' big.matrix, filebacked.big.matrix, as.big.matrix
#' 
#' is.big.matrix, is.separated, is.filebacked
#' 
#' describe, attach.big.matrix, attach.resource
#' 
#' sub.big.matrix, is.sub.big.matrix
#' 
#' dim, dimnames, nrow, ncol, print, head, tail, typeof, length
#' 
#' read.big.matrix, write.big.matrix
#' 
#' mwhich
#' 
#' morder, mpermute
#' 
#' deepcopy
#' 
#' flush }
#' 
#' Multi-gigabyte data sets challenge and frustrate users, even on
#' well-equipped hardware. Use of \acronym{C/C++} can provide efficiencies, but
#' is cumbersome for interactive data analysis and lacks the flexibility and
#' power of 's rich statistical programming environment.  The package
#' \pkg{bigmemory} and associated packages \pkg{biganalytics}, 
#' \pkg{synchronicity}, \pkg{bigtabulate}, and \pkg{bigalgebra} bridge 
#' this gap, implementing massive matrices and supporting their manipulation 
#' and exploration. The data
#' structures may be allocated to shared memory, allowing separate processes on
#' the same computer to share access to a single copy of the data set.  The
#' data structures may also be file-backed, allowing users to easily manage and
#' analyze data sets larger than available RAM and share them across nodes of a
#' cluster. These features of the Bigmemory Project open the door for powerful
#' and memory-efficient parallel analyses and data mining of massive data sets.
#' 
#' This project (\pkg{bigmemory} and its sister packages) is still actively
#' developed, although the design and current features can be viewed as
#' "stable."  Please feel free to email us with any questions:
#' bigmemoryauthors@gmail.com.
#' 
#' @name bigmemory-package
#' @aliases bigmemory-package bigmemory
#' @docType package
#' @note Various options are available.
#' \code{options(bigmemory.typecast.warning)} can be set to avoid annoying
#' warnings that might occur if, for example, you assign objects (typically
#' type double) to char, short, or integer \code{\link{big.matrix}} objects.
#' \code{options(bigmemory.print.warning)} protects against extracting and
#' printing a massive matrix (which would involve the creation of a second
#' massive copy of the matrix). \code{options(bigmemory.allow.dimnames)} by
#' default prevents the setting of \code{dimnames} attributes, because they
#' aren't allocated to shared memory and changes will not be visible across
#' processes. \code{options(bigmemory.default.type)} is \code{"double"} be
#' default (a change in default behavior as of 4.1.1) but may be changed by the
#' user.
#' 
#' Note that you can't simply use a \code{big.matrix} with many (most) existing
#' functions (e.g. \code{\link{lm}}, \code{\link{kmeans}}).  One nice exception
#' is \code{\link{split}}, because this function only accesses subsets of the
#' matrix.
#' 
#' @section Memory considerations:
#' 
#' For obvious reasons memory that the \code{big.matrix} uses is managed outside 
#' the R memory pool available to the garbage collector and the memory occupied 
#' by the \code{big.matrix} is not visible to the R.
#' This has subtle implications:
#' 
#' \itemize{
#' \item Memory usage is not visible via general R functions (e.g. the \code{gc()} function)
#' \item Garbage collector is mislead by the very small memory footprint of the \code{big.matrix}
#'       object (which acts merely as a pointer to the external memory structure), which can result
#'       in much less eagerness to garbage-collect the unused \code{big.memory} objects. 
#'       After removing a last reference to a big \code{big.matrix}, user should manually run 
#'       \code{gc()} to reclaim the memory.
#' \item Attaching the description of already finalized \code{big.matrix} and accessing this object
#'       will result in undefined behavior, which simply means it will crash the current R session
#'       with no hope of saving the data in it. To prevent R from de-allocating (finalizing) the 
#'       matrices, user should keep at least one \code{big.memory} object somewhere in R memory in at
#'       least one R session on the current machine.
#' \item Abruptly closed R (using e.g. task manager) will not have a chance to finalize the 
#'       \code{big.matrix} objects, which will result in a memory leak, as the \code{big.matrices} 
#'       will remain in the memory (perhaps under obfuscated names) with no easy way to reconnect R to them. 
#' 
#' } 
#' 
#' @author Michael J. Kane, John W. Emerson, Peter Haverty, and Charles Determan Jr.
#' 
#' Maintainers: Michael J. Kane bigmemoryauthors@gmail.com
#' @seealso For example, \code{\link{big.matrix}}, \code{\link{mwhich}},
#' \code{\link{read.big.matrix}}
#'
#' @keywords package
#' @examples
#' 
#' 
#' # Our examples are all trivial in size, rather than burning huge amounts
#' # of memory.
#' 
#' x <- big.matrix(5, 2, type="integer", init=0,
#'                 dimnames=list(NULL, c("alpha", "beta")))
#' x
#' x[1:2,]
#' x[,1] <- 1:5
#' x[,"alpha"]
#' colnames(x)
#' options(bigmemory.allow.dimnames=TRUE)
#' colnames(x) <- NULL
#' x[,]
#' 
#' 
NULL

Try the bigmemory package in your browser

Any scripts or data that you put into this service are public.

bigmemory documentation built on May 3, 2022, 1:07 a.m.