This vignette introduces the ideas that motivates the development
process of package folio
.
In R
, it is not possible to enforce a storage mode and/or a class such
as data.table
and specify which columns can live in an instance of
that class. It is also not possible to specify a prototype (names,
classes) for these columns. Finally, it is not possible to force an
object to always conform to that prototype while it exists.
Usually, objects that requires a specific structure and storage mode are
implemented as S3
objects through a pseudo-constructor function that
receives the required inputs and outputs the desired structure. A class
attribute is usually attached to that object, so that methods can be
dispatched on it later
on.
# Assume your application requires a data.frame with three inputs: "x", "y" and
# "z". These fields respectivly hold integer, numerical and character values.
# We will call this specific structure `myClass`. Most people would use the
# basic R S3 class system (the one you always use, without necessarily knowing
# it) and create a constructor function that returns just that.
new_myClass <- function(x = integer(), y = numeric(), z = character())
{
# Some checks on x, y, z...
return(
base::structure(
data.frame(x, y, z, stringsAsFactors = FALSE),
names = c("x", "y", "z"),
class = c("myClass", "data.frame")
)
)
}
# Instances of that (S3) class can be generated by using the constructor defined
# above.
myObject <- new_myClass(
x = c(1L, 2L, 3L),
y = c(4.1, 5.2, 6.3),
z = c("obs1", "obs2", "obs3")
)
# Methods can be dispatched on myClass. As an example, one could implement a
# method to compute means on instances of class myClass. This requires a S3
# method that would be called from the generic function mean().
mean.myClass <- function(x, ...)
{
return(mean(c(x$x, y$y), na.rm = TRUE))
}
mean(myObject) # Outputs 3.6.
# This is fine. But nothing prevents the user from bypassing the implicit
# conventions of myClass later on by writing something that could break things.
# Here are some examples:
myObject$z <- NULL
myObject$x <- "Do not tell me what to do, YOLO!"
mean(myObject) # An error is now returned.
# Since R S3 classes are not based on a formal object-oriented programming (OOP)
# paradigm, the statements above are perfectly valid, since they are valid R
# expressions. This is a problem for complex programs. Ideally, the structure
# should be left unchanged and only the data within the object should be
# allowed to change, as long as it respects the conventions of the class. In
# other words, we need a clear mechanism to enforce validity everytime the
# object is used. For that, we need to avoid ad-hoc implementations.
Freedom is usually a good thing to R
by allowing flexibility in
interactive sessions. It also gives the user complete control over its
data. This yields faster results and stronger analyses. However,
this flexibility can also be counter-productive in many contexts.
Automated applications that requires robustness and formality. They work precisely because clear conventions were established (the code is based on these). Bypassing them leads to undefiened behavior: errors, bugs, performance issues, fallacious results, etc.
Large teams of developers all using R
but working on many
different projects in parallel. Flexibility will inevitably lead to
ad-hoc implementations either of questionable quality and/or
simply not portable. The code is therefore harder to share, reuse,
maintain and debug.
Large teams of developers maintaining many different APIs and scripts in production. Performance becomes critical as resources are limited (CPU usage, RAM usage, I/O overhead, etc.). Flexibility leads to variable performance that can impact the whole production chain.
Provide an easy-to-use and efficient API to create super-classes (and objects) that (1) can be used in any project requiring OOP formality and that (2) inherits strong foundations from a common low-level architecture.
Package folio
tries to solve these problems by implementing robust
low-level containers that can hold any type of rectangular data with a
focus on formality and performance.
Developers can use and/or extend the classes defined in folio
, adapt
them to their specific needs without ever needing to care about the
underlying API implementation. Since the API itself would be common,
sharing, reusing, maintaining and debugging applications should become
much easier. This assumes folio
is robust enough, evidently.
Package folio
builds on recognized R
packages only and uses a
minimal set of dependencies.
Package folio
is itself based on the recognized methods
package
(written by John Chambers himself) which implements formal
Object-Oriented Programming (OOP) in S
and R
. The implementation is
written in R
and C
and much of the code is included into the source
code of R
itself, for efficiency. It is a very mature package first
included in R
version 1.6 in 2001.
The methods
package has had a wide success in the last 20 years and
has led to recognized worldwide projects such as Bioconductor
,
spatial
, raster
and Matrix
(among others). All these projects led
to complex networks of hundreds of inter-dependent classes which are all
still on the forefront of the most popular R
packages.
The features of
methods
are much needed in production environments requiring formality.
Package folio
is based on the very mature data.table
package for
data manipulations. In the last 10 years, no other R
package came even
close to being as fast, efficient and robust as data.table
can be.
This is because this package is backed by thousands of unit tests, a
strong community of developers and strong C
foundations (most of the
code is written in C
for efficiency).
The features of
data.table
are much needed in production environments requiring efficiency and robustness.
Package folio
minimizes dependencies by relying solely on system
packages installed by default with R
and on a minimal number of
external packages such as data.table
, sp
and rgdal
. These are used
to manipulate spatial data (and underlying metadata).
Package folio
is built by using recognized developers’ tools:
devtools
, roxygen2
, desc
, microbenchmark
, testthat
and
usethis
. Packages knitr
and rmarkdown
are used for vignettes.
Using recognized packages in production environments minimize bugs and undefined behaviors. It also eases maintenance.
The obvious long-term intent is to have an appealing architecture from
which large teams can benefit from. Package folio
needs to find a
community of developers, so that it can properly grow and scale.
If you are interested in developing unsexy low-level things, please reach out to the maintainer!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.