knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library("AltWrapper")

Introduction

There are numerous data sources in R. When data is not in a standard R format, R provide a variaty of methods to read it. One can write a function to convert the data into an R data structure, or implement an S3/S4 class and keep the data in its original format. However, this flexibility introduces a lot of troubles when the data is interacting with the other R packages. While one package may assume the data is an R vector, the other package may expect the data is an S4 object with [ and [<- implemented. Without a standard, it will waste a lot of time to handle the data structure, instead of doing the data analysis.

ALTREP is a set of new APIs in R provided since R 3.5. It is designed to represent a non-standard data structure to a standard R vector using the unified APIs. At R level, there is no difference between an R vector and an ALTREP vector. This mechanism provides a possibility to wrap any data to a lightweight R vector without using S3/S4 class while still preserving it original data format. While ALTREP has the potential to be an official way to read any non-standard data in R, the requirement of knowing C++ language and R's C interfact and lacking of official document impede its development. Even for those who is familar with C++ language, it will takes a lot of energy to learn ALTREP and manage their C++ code. AltWrapper is a package for developers to make the use of ALTREP simpler in R. Developers are able to define all ALTREP APIs using R functions, and to debug and replace an ALTREP function at any time without doing compilation. While R code is generally slower compared to C++ code, a special consideration is token to minimize the overhead of calling an user-defined R function. With the help of AltWrapper, Users are not required to know C++ language to use ALTREP.

Example

We will illustrate the package with a simple example. Suppose we want to represent a sequence vector in a compact format. The sequence only has a starting value and its increment as the data. We will create a compacted R vecotr by defining ALTREP APIS.

#sequence 1:10
A_compact = list(start = 1L, by = 1L, length = 10L)

#Define ALTREP APIs
lengthFunc <- function(x) {
  return(x$length)
}
getElementFunc <- function(x, i) {
  return(x$start + x$by * (i - 1))
}

## Register ALTREP class and functions
## The type of the vector is integer
setAltClass(className = "compactSeq", "integer")
setAltMethod(className = "compactSeq", getLength = lengthFunc)
setAltMethod(className = "compactSeq", getElement = getElementFunc)

#Create altWrapper object
A_vector = newAltrep(className = "compactSeq", x = A_compact)

#Check the result
is.vector(A_vector)

We claim that the altWrapper class is an integer class. Therefore, the return value of the function newAltrep is an integer R vector. The value can be accessed via regular [ operator. An altWrapper object must have getLength defined. For most operations, altWrapper API getDataptr is required. Since there is no actual data for a compact sequence, we define the API getElement instead. Please refer to ?setAltMethod to see a full list of available APIs.

Please note that the variable A_vector cannot be printed just by typing its variable name in the console for ALTREP is still underdevelopment and some R functions do not catch up with it. At the time of this writting, only R 3.7 devel version can correctly print the object. Before R 3.7, the variable can be printed out either by using [ operator to force R to use get_Elt method, or by using printAltWrapper function provided by AltWrapper package.

## Use [ operator
A_vector[1L:10L]

## Use print method
printAltWrapper(A_vector)

Alternatively, there is an S3/S4 print method defined for the altWrapper class, you can create an S3 object by calling the function newAltrep with the argument S3Class = TRUE.

#Create an S3 version of altWrapper object
A_vector_s3 = newAltrep(className = "compactSeq", x = A_compact, S3Class = TRUE)
A_vector_s3
#Check class type
class(A_vector)
class(A_vector_s3)

Manage AltWrapper Class

Function Define/Redefine

The definition of an API of an altWrapper class is ad hoc. You can change the API even after an object is created. For example, we can define the sum function of the class compactSeq given that we have created an object A_vector for the class.

sumFunc <- function(x, na.rm) {
  message("Sum function")
  a1 = x$start
  n = x$length
  d = x$by
  return(n * a1 + n * (n - 1) * d / 2)
}

setAltMethod(className = "compactSeq", sum = sumFunc)

sum(A_vector)

The variable A_vector has been made before we define the sum function for its class, but we still see the message from our customized sum function. We can also redefine the sum function at any moment.

sumFunc <- function(x, na.rm) {
  message("Sum function V2")
  a1 = x$start
  n = x$length
  d = x$by
  return(n * a1 + n * (n - 1) * d / 2)
}

setAltMethod(className = "compactSeq", sum = sumFunc)

sum(A_vector)

A warning will be shown to indicate the sum function has been defined previously. It can be suppressed by calling setAltWrapperOptions(redefineWarning=FALSE). A function can be deleted by setting its value to NULL. For example, The sum function can be removed from the class compactSeq via setAltMethod(className = "compactSeq", sum = NULL).

Check altWrapper data

you can get the altWrapper data from an altWrapper object by calling getAltWrapperData function on A_vector

x = getAltWrapperData(A_vector)
x

Naturally, the altWrapper data can be set via setAltWrapperData

x$start = 2
A_vector = setAltWrapperData(A_vector, x)
printAltWrapper(A_vector)

By default, the function setAltWrapperData will duplicate the object. If you only want to memorize some data(e.g. caching on-disk data), you can call setAltWrapperData with argument duplicate = FALSE. Please do not set duplicate = FALSE if your new x will change the value of A_vector, because one R object can have multiple variable names, changing the value of an immutable object will change the value of all associated variables.

Check Class Functions

The following example shows how to retrieve a function from an altWrapper class.

sumFunc = getAltMethod(className = "compactSeq", methodName = "sum")
sumFunc

A NULL value will be returned if the function is not found. You can also get a summary report of an altWrapper class, the report shows the type of the ALTREP class and status of all ALTREP APIs.

showAltClass(className = "compactSeq")

ALTREP support

Because the altWrapper class is built upon ALTREP, the package exports a few ALTREP APIs. Please note that it is not recommended to call an ALTREP function directly to get/set the value of an altWrapper object since the data structure of an altWrapper object is subjected to change. These functions are served as development tools only. Here we simply list the ALTREP APIs, please call ?function in R to see their documents.

is.altrep(A_vector)
.getAltData1(A_vector)
.getAltData2(A_vector)
.setAltData1(A_vector, .getAltData1(A_vector))
.setAltData2(A_vector, .getAltData2(A_vector))

warnings

Although there is no real limitation on how an altWrapper function is constructed, there are certain expectations of the altWrapper function and it is a good practice to follow them:

Session Info

sessionInfo()


Jiefei-Wang/AltWrapper documentation built on Oct. 30, 2019, 7:43 p.m.