README.md

fastverse

R-CMD-check fastverse status badge CRAN status cran checks downloads per month downloads Conda Version Conda Downloads status

The fastverse is a suite of complementary high-performance packages for statistical computing and data manipulation in R. Developed independently by various people, fastverse packages jointly contribute to the objectives of:

The fastverse package is a meta-package providing utilities for easy installation, loading and management of these packages. It is an extensible framework that allows users to (permanently) add or remove packages to create a 'verse' of packages suiting their general needs, or even create separate 'verses' of their own.

fastverse packages are jointly attached with library(fastverse), and several functions starting with fastverse_ help manage dependencies, detect namespace conflicts, add/remove packages from the fastverse and update packages. The vignette provides a concise overview of the package.

Core Packages

The fastverse installs with 4 core packages^[Before v0.3.0 matrixStats and fst were part of the core fastverse, but were removed following a poll in November 2022 which established that more than 50% of users don't use them actively.] (5 dependencies in total) which provide broad C/C++ based statistical and data manipulation functionality and have carefully managed APIs.

Installation

# Install the CRAN version
install.packages("fastverse")

# Install (Windows/Mac binaries) from R-universe
install.packages("fastverse", repos = "https://fastverse.r-universe.dev")

# Install from GitHub (requires compilation)
remotes::install_github("fastverse/fastverse")

Note that the GitHub/r-universe version is not a development version, development takes place in the 'development' branch.

Extending the fastverse

Users can, via the fastverse_entend() function, freely attach extension packages. Setting permanent = TRUE adds these packages to the core fastverse. Another option is adding a .fastverse config file with packages to the project directory. Separate verses can be created with fastverse_child(). See the vignette for details.

Suggested Extensions

High-performing packages for different data manipulation and statistical computing topics are suggested below. The total (recursive) dependency count is indicated for each package.

Time Series

Notes: xts/zoo objects are preserved by roll functions and by collapse's time series and data transformation functions^[collapse functions can also handle irregular time series.]. As xts/zoo objects are matrices, all matrixStats functions apply to them as well. xts objects can also easily be converted to and from data.table, which also has some fast rolling functions like frollmean and frollapply.

Dates and Times

Notes: Date and time variables are preserved in many data.table and collapse operations. data.table additionally offers an efficient integer based date class 'IDate' with some supporting functionality. xts and zoo also provide various functions to transform dates, and zoo provides classes 'yearmon' and 'yearqtr' for convenient computation with monthly and quarterly data. Package mondate also provides a class 'mondate' for monthly data.

Strings

Statistics and Computing

Notes: Rfast has a number of like-named functions to matrixStats. These are simpler but typically faster and support multi-threading. Some highly efficient statistical functions can also be found scattered across various other packages, notable to mention here are Hmisc (60 dependencies) and DescTools (17 dependencies).

Spatial

Notes: collapse can be used for efficient manipulation and computations on sf data frames. sf also offers tight integration with dplyr.

Visualization

Notes: latticeExtra provides extra graphical utilities base on lattice. gridExtra provides miscellaneous functions for grid graphics (and consequently for ggplot2 which is based on grid). gridtext provides improved text rendering support for grid graphics. Many packages offer ggplot2 extensions, (typically starting with 'gg') such as ggExtra, ggalt, ggforce, ggmap, ggtext, ggthemes, ggrepel, ggridges, ggfortify, ggstatsplot, ggeffects, ggsignif, GGally, ggcorrplot, ggdendro, etc.. Users in desperate need for greater performance may also find the (unmaintained) lwplot package useful that provides a faster and lighter version of ggplot2 with data.table backend.

Tidyverse-like Data Manipulation built on data.table

Data Manipulation in R Based on Faster Languages

R-like Data Manipulation in Faster Languages

Data Input-Output, Serialization, and Larger-Than-Memory Processing (IO)

Notes: data.table provides fread and fwrite for fast reading of delimited files.

Compiling R

Notes: Many of these projects are experimental and not available as CRAN packages.

R Bindings to Faster Languages

Notes: There are many Rcpp extension packages binding R to powerful C++ libraries, such as linear algebra through RcppArmadillo and RcppEigen, thread-safe parallelism through RcppParallel etc.

Parallelization, High-Performance Computing and Out-Of-Memory Data

Adding to this list

Please notify me of any other packages you think should be included here. Such packages should be well designed, top-performing, low-dependency, and, with few exceptions, provide own compiled code. Please note that the fastverse focuses on general purpose statistical computing and data manipulation, thus I won't include fast packages to estimate specific kinds of models here (of which R also has a great many).



Try the fastverse package in your browser

Any scripts or data that you put into this service are public.

fastverse documentation built on Sept. 20, 2023, 9:07 a.m.