refdata: subsettable reference to matrix or data.frame
In ref: References for R

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/refdata.R

Function refdata creates objects of class refdata which behave not totally unlike matrices or data.frames but allow for much more memory efficient handling.

# -- usage for R CMD CHECK, see below for human readable version -----------
refdata(x)
derefdata(x)
derefdata(x) <- value
 ## S3 method for class 'refdata'
x[i = NULL, j = NULL, drop = FALSE, ref = FALSE]
 ## S3 replacement method for class 'refdata'
x[i = NULL, j = NULL, ref = FALSE] <- value
 ## S3 method for class 'refdata'
dim(x)
 ## S3 method for class 'refdata'
dimnames(x)
 ## S3 method for class 'refdata'
row.names(x)
 ## S3 method for class 'refdata'
names(x)

# -- most important usage for human beings  --------------------------------
# rd <- refdata(x)             # create reference
# derefdata(rd)                # retrieve original data
# derefdata(rd) <- value       # modify original data
# rd[]                         # get all (current) data
# rd[i, j]                     # get part of data
# rd[i, j, ref=TRUE]           # get new reference on part of data
# rd[i, j]           <- value  # modify / create local copy
# rd[i, j, ref=TRUE] <- value  # modify original data (respecting subsetting history)
# dim(rd)                      # dim of (subsetted) data
# dimnames(rd)                 # dimnames of (subsetted) data

`x`	a matrix or data.frame or any other 2-dimensional object that has operators "[" and "[<-" defined
`i`	row index
`j`	col index
`ref`	FALSE by default. In subsetting: FALSE returns data, TRUE returns new refdata object. In assignments: FALSE modifies a local copy and returns a refdata object embedding it, TRUE modifies the original.
`drop`	FALSE by default, i.e. returned data have always a dimension attribute. TRUE drops dimension in some cases, the exact result depends on whether a `matrix` or `data.frame` is embedded
`value`	some value to be assigned

Refdata objects store 2D-data in one environment and index information in another environment. Derived refdata objects usually share the data environment but not the index environment.
The index information is stored in a standardized and memory efficient form generated by optimal.index.
Thus refdata objects can be copied and subsetted and even modified without duplicating the data in memory.
Empty square bracket subsetting (rd[]) returns the data, square bracket subsetting (rd[i, j]) returns subsets of the data as expected.
An additional argument (rd[i, j, ref=TRUE]) allows to get a reference that stores the subsetting indices. Such a reference behaves transparently as if a smaller matrix/data.frame would be stored and can be subsetted again recursively. With ref=TRUE indices are always interpreted as row/col indices, i.e. x[i] and x[cbind(i, j)] are undefined (and raise stop errors)
Standard square bracket assignment (rd[i, j] <- value) creates a reference to a locally modified copy of the (potentially subsetted) data.
An additional argument (rd[i, j, ref=TRUE] <- value) allows to modify the original data, properly recognizing the subsetting history.
A method dim(refdata) returns the dim of the (indexed) data.
A dimnames(refdata) returns the dimnames of the (indexed) data.

an object of class refdata (appended to class attributes of data), which is an empty list with two attributes

`dat`	the environment where the data x and its dimension dim is stored
`ind`	the environment where the indexes i, j and the effective subset size ni, nj is stored

The refdata code is currently R only (not implemented for S+).
Please note the following differences to matrices and dataframes:

x[]: you need to write x[] instead of x in order to get all current data
drop=FALSE: by default drop=FALSE which gives consistent behaviour for matrices and data.frames. You can use the $- or [[-operator to extract single column vectors which are granted to be of a consistent data type. However, currently $ and [[ are only wrappers to [. They might be performance tuned in later versions.
x[i]: single index subsetting is not defined, use x[][i] instead, but beware of differences between matrices and dataframes
x[cbind()]: matrix index subsetting is not defined, use x[][cbind(i, j)] instead
ref=TRUE: parameter ref needs to be used sensibly to exploit the advantages of refdata objects

Jens Oehlschl<e4>gel

Extract, matrix, data.frame, optimal.index, ref

  ## Simple usage Example
  x <- cbind(1:5, 5:1)       # take a matrix or data frame
  rx <- refdata(x)           # wrap it into an refdata object
  rx                         # see the autoprinting
  rm(x)                      # delete original to save memory
  rx[]                       # extract all data
  rx[-1, ]                   # extract part of data
  rx2 <- rx[-1, , ref=TRUE]  # create refdata object referencing part of data 
                             # (only index, no data is duplicated)
  rx2                        # compare autoprinting
  rx2[]                      # extract 'all' data
  rx2[-1, ]                  # extract part of (part of) data
  cat("for more examples look the help pages\n")

 ## Not run: 
  # Memory saving demos
  square.matrix.size <- 1000
  recursion.depth.limit <- 10
  non.referenced.matrix <- matrix(1:(square.matrix.size*square.matrix.size)
  , nrow=square.matrix.size, ncol=square.matrix.size)
  rownames(non.referenced.matrix) <- paste("a", seq(length=square.matrix.size), sep="")
  colnames(non.referenced.matrix) <- paste("b", seq(length=square.matrix.size), sep="")
  referenced.matrix <- refdata(non.referenced.matrix)
  recurse.nonref <- function(m, depth.limit=10){
    x <- m[1,1]   # need read access here to create local copy
    gc()
    cat("depth.limit=", depth.limit, "  memory.size=", memsize.wrapper(), "\n", sep="")
    if (depth.limit)
      Recall(m[-1, -1, drop=FALSE], depth.limit=depth.limit-1)
    invisible()
  }
  recurse.ref <- function(m, depth.limit=10){
    x <- m[1,1]   # read access, otherwise nothing happens
    gc()
    cat("depth.limit=", depth.limit, "  memory.size=",  memsize.wrapper(), "\n", sep="")
    if (depth.limit)
      Recall(m[-1, -1, ref=TRUE], depth.limit=depth.limit-1)
    invisible()
  }
  gc()
  memsize.wrapper()
  recurse.ref(referenced.matrix, recursion.depth.limit)
  gc()
   memsize.wrapper()
  recurse.nonref(non.referenced.matrix, recursion.depth.limit)
  gc()
   memsize.wrapper()
  rm(recurse.nonref, recurse.ref, non.referenced.matrix
  , referenced.matrix, square.matrix.size, recursion.depth.limit)
  
## End(Not run)
  cat("for even more examples look at regression.test.refdata()\n")
  regression.test.refdata()  # testing correctness of refdata functionality