_dev/_doc/README.md

A primer to package folio

This vignette introduces the ideas that motivates the development process of package folio.

The problem

In R, it is not possible to enforce a storage mode and/or a class such as data.table and specify which columns can live in an instance of that class. It is also not possible to specify a prototype (names, classes) for these columns. Finally, it is not possible to force an object to always conform to that prototype while it exists.

Usually, objects that requires a specific structure and storage mode are implemented as S3 objects through a pseudo-constructor function that receives the required inputs and outputs the desired structure. A class attribute is usually attached to that object, so that methods can be dispatched on it later on.

# Assume your application requires a data.frame with three inputs: "x", "y" and
# "z". These fields respectivly hold integer, numerical and character values.
# We will call this specific structure `myClass`. Most people would use the
# basic R S3 class system (the one you always use, without necessarily knowing
# it) and create a constructor function that returns just that.

new_myClass <- function(x = integer(), y = numeric(), z = character())
{
    # Some checks on x, y, z...

    return(
        base::structure(
            data.frame(x, y, z, stringsAsFactors = FALSE),
            names = c("x", "y", "z"),
            class = c("myClass", "data.frame")
        )
    )
}

# Instances of that (S3) class can be generated by using the constructor defined
# above.

myObject <- new_myClass(
    x = c(1L, 2L, 3L),
    y = c(4.1, 5.2, 6.3),
    z = c("obs1", "obs2", "obs3")
)

# Methods can be dispatched on myClass. As an example, one could implement a
# method to compute means on instances of class myClass. This requires a S3
# method that would be called from the generic function mean().

mean.myClass <- function(x, ...)
{
    return(mean(c(x$x, y$y), na.rm = TRUE))
}

mean(myObject) # Outputs 3.6.

# This is fine. But nothing prevents the user from bypassing the implicit
# conventions of myClass later on by writing something that could break things.
# Here are some examples:

myObject$z <- NULL
myObject$x <- "Do not tell me what to do, YOLO!"

mean(myObject) # An error is now returned.

# Since R S3 classes are not based on a formal object-oriented programming (OOP)
# paradigm, the statements above are perfectly valid, since they are valid R
# expressions. This is a problem for complex programs. Ideally, the structure
# should be left unchanged and only the data within the object should be
# allowed to change, as long as it respects the conventions of the class. In
# other words, we need a clear mechanism to enforce validity everytime the
# object is used. For that, we need to avoid ad-hoc implementations.

Freedom can be a bad thing

Freedom is usually a good thing to R by allowing flexibility in interactive sessions. It also gives the user complete control over its data. This yields faster results and stronger analyses. However, this flexibility can also be counter-productive in many contexts.

The folio solution

Provide an easy-to-use and efficient API to create super-classes (and objects) that (1) can be used in any project requiring OOP formality and that (2) inherits strong foundations from a common low-level architecture.

Package folio tries to solve these problems by implementing robust low-level containers that can hold any type of rectangular data with a focus on formality and performance.

Developers can use and/or extend the classes defined in folio, adapt them to their specific needs without ever needing to care about the underlying API implementation. Since the API itself would be common, sharing, reusing, maintaining and debugging applications should become much easier. This assumes folio is robust enough, evidently.

The foundations of folio

Package folio builds on recognized R packages only and uses a minimal set of dependencies.

Object-oriented programming (OOP)

Package folio is itself based on the recognized methods package (written by John Chambers himself) which implements formal Object-Oriented Programming (OOP) in S and R. The implementation is written in R and C and much of the code is included into the source code of R itself, for efficiency. It is a very mature package first included in R version 1.6 in 2001.

The methods package has had a wide success in the last 20 years and has led to recognized worldwide projects such as Bioconductor, spatial, raster and Matrix (among others). All these projects led to complex networks of hundreds of inter-dependent classes which are all still on the forefront of the most popular R packages.

The features of methods are much needed in production environments requiring formality.

Data manipulations

Package folio is based on the very mature data.table package for data manipulations. In the last 10 years, no other R package came even close to being as fast, efficient and robust as data.table can be. This is because this package is backed by thousands of unit tests, a strong community of developers and strong C foundations (most of the code is written in C for efficiency).

The features of data.table are much needed in production environments requiring efficiency and robustness.

Other dependencies

Package folio minimizes dependencies by relying solely on system packages installed by default with R and on a minimal number of external packages such as data.table, sp and rgdal. These are used to manipulate spatial data (and underlying metadata).

Package folio is built by using recognized developers’ tools: devtools, roxygen2, desc, microbenchmark, testthat and usethis. Packages knitr and rmarkdown are used for vignettes.

Using recognized packages in production environments minimize bugs and undefined behaviors. It also eases maintenance.

The vision

The obvious long-term intent is to have an appealing architecture from which large teams can benefit from. Package folio needs to find a community of developers, so that it can properly grow and scale.

If you are interested in developing unsexy low-level things, please reach out to the maintainer!



jeanmathieupotvin/cargo documentation built on Oct. 27, 2020, 5:22 p.m.