refdata: subsettable reference to matrix or data.frame

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/refdata.R

Description

Function refdata creates objects of class refdata which behave not totally unlike matrices or data.frames but allow for much more memory efficient handling.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# -- usage for R CMD CHECK, see below for human readable version -----------
refdata(x)
derefdata(x)
derefdata(x) <- value
 ## S3 method for class 'refdata'
x[i = NULL, j = NULL, drop = FALSE, ref = FALSE]
 ## S3 replacement method for class 'refdata'
x[i = NULL, j = NULL, ref = FALSE] <- value
 ## S3 method for class 'refdata'
dim(x)
 ## S3 method for class 'refdata'
dimnames(x)
 ## S3 method for class 'refdata'
row.names(x)
 ## S3 method for class 'refdata'
names(x)

# -- most important usage for human beings  --------------------------------
# rd <- refdata(x)             # create reference
# derefdata(rd)                # retrieve original data
# derefdata(rd) <- value       # modify original data
# rd[]                         # get all (current) data
# rd[i, j]                     # get part of data
# rd[i, j, ref=TRUE]           # get new reference on part of data
# rd[i, j]           <- value  # modify / create local copy
# rd[i, j, ref=TRUE] <- value  # modify original data (respecting subsetting history)
# dim(rd)                      # dim of (subsetted) data
# dimnames(rd)                 # dimnames of (subsetted) data

Arguments

x

a matrix or data.frame or any other 2-dimensional object that has operators "[" and "[<-" defined

i

row index

j

col index

ref

FALSE by default. In subsetting: FALSE returns data, TRUE returns new refdata object. In assignments: FALSE modifies a local copy and returns a refdata object embedding it, TRUE modifies the original.

drop

FALSE by default, i.e. returned data have always a dimension attribute. TRUE drops dimension in some cases, the exact result depends on whether a matrix or data.frame is embedded

value

some value to be assigned

Details

Refdata objects store 2D-data in one environment and index information in another environment. Derived refdata objects usually share the data environment but not the index environment.
The index information is stored in a standardized and memory efficient form generated by optimal.index.
Thus refdata objects can be copied and subsetted and even modified without duplicating the data in memory.
Empty square bracket subsetting (rd[]) returns the data, square bracket subsetting (rd[i, j]) returns subsets of the data as expected.
An additional argument (rd[i, j, ref=TRUE]) allows to get a reference that stores the subsetting indices. Such a reference behaves transparently as if a smaller matrix/data.frame would be stored and can be subsetted again recursively. With ref=TRUE indices are always interpreted as row/col indices, i.e. x[i] and x[cbind(i, j)] are undefined (and raise stop errors)
Standard square bracket assignment (rd[i, j] <- value) creates a reference to a locally modified copy of the (potentially subsetted) data.
An additional argument (rd[i, j, ref=TRUE] <- value) allows to modify the original data, properly recognizing the subsetting history.
A method dim(refdata) returns the dim of the (indexed) data.
A dimnames(refdata) returns the dimnames of the (indexed) data.

Value

an object of class refdata (appended to class attributes of data), which is an empty list with two attributes

dat

the environment where the data x and its dimension dim is stored

ind

the environment where the indexes i, j and the effective subset size ni, nj is stored

Note

The refdata code is currently R only (not implemented for S+).
Please note the following differences to matrices and dataframes:

x[]

you need to write x[] instead of x in order to get all current data

drop=FALSE

by default drop=FALSE which gives consistent behaviour for matrices and data.frames. You can use the $- or [[-operator to extract single column vectors which are granted to be of a consistent data type. However, currently $ and [[ are only wrappers to [. They might be performance tuned in later versions.

x[i]

single index subsetting is not defined, use x[][i] instead, but beware of differences between matrices and dataframes

x[cbind()]

matrix index subsetting is not defined, use x[][cbind(i, j)] instead

ref=TRUE

parameter ref needs to be used sensibly to exploit the advantages of refdata objects

Author(s)

Jens Oehlschl<e4>gel

See Also

Extract, matrix, data.frame, optimal.index, ref

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
  ## Simple usage Example
  x <- cbind(1:5, 5:1)       # take a matrix or data frame
  rx <- refdata(x)           # wrap it into an refdata object
  rx                         # see the autoprinting
  rm(x)                      # delete original to save memory
  rx[]                       # extract all data
  rx[-1, ]                   # extract part of data
  rx2 <- rx[-1, , ref=TRUE]  # create refdata object referencing part of data 
                             # (only index, no data is duplicated)
  rx2                        # compare autoprinting
  rx2[]                      # extract 'all' data
  rx2[-1, ]                  # extract part of (part of) data
  cat("for more examples look the help pages\n")

 ## Not run: 
  # Memory saving demos
  square.matrix.size <- 1000
  recursion.depth.limit <- 10
  non.referenced.matrix <- matrix(1:(square.matrix.size*square.matrix.size)
  , nrow=square.matrix.size, ncol=square.matrix.size)
  rownames(non.referenced.matrix) <- paste("a", seq(length=square.matrix.size), sep="")
  colnames(non.referenced.matrix) <- paste("b", seq(length=square.matrix.size), sep="")
  referenced.matrix <- refdata(non.referenced.matrix)
  recurse.nonref <- function(m, depth.limit=10){
    x <- m[1,1]   # need read access here to create local copy
    gc()
    cat("depth.limit=", depth.limit, "  memory.size=", memsize.wrapper(), "\n", sep="")
    if (depth.limit)
      Recall(m[-1, -1, drop=FALSE], depth.limit=depth.limit-1)
    invisible()
  }
  recurse.ref <- function(m, depth.limit=10){
    x <- m[1,1]   # read access, otherwise nothing happens
    gc()
    cat("depth.limit=", depth.limit, "  memory.size=",  memsize.wrapper(), "\n", sep="")
    if (depth.limit)
      Recall(m[-1, -1, ref=TRUE], depth.limit=depth.limit-1)
    invisible()
  }
  gc()
  memsize.wrapper()
  recurse.ref(referenced.matrix, recursion.depth.limit)
  gc()
   memsize.wrapper()
  recurse.nonref(non.referenced.matrix, recursion.depth.limit)
  gc()
   memsize.wrapper()
  rm(recurse.nonref, recurse.ref, non.referenced.matrix
  , referenced.matrix, square.matrix.size, recursion.depth.limit)
  
## End(Not run)
  cat("for even more examples look at regression.test.refdata()\n")
  regression.test.refdata()  # testing correctness of refdata functionality

ref documentation built on May 2, 2019, 6:08 p.m.

Related to refdata in ref...