knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

fsttable

Linux/OSX Build Status Windows Build status License: AGPL v3 Lifecycle: maturing

R package fsttable aims to provide a fully functional data.table interface to on-disk fst files. The focus of the package is on keeping memory usage as low as possible woithout sacrificing features of in-memory data.table operations.

Installation

You can install the latest package version with:

devtools::install_github("fstpackage/fsttable")

Example

First, we create a on-disk fst file containing a medium sized dataset:

library(fsttable)

# write some sample data to disk
nr_of_rows <- 1e6
x <- data.table::data.table(X = 1:nr_of_rows, Y = LETTERS[1 + (1:nr_of_rows) %% 26])
fst::write_fst(x, "1.fst")

Then we define our fst_table by using:

ft <- fst_table("1.fst")

This fst_table can be used as a regular data.table object. For example, we can print:

ft

we can select columns:

ft[, .(Y)]

and rows:

ft[1:4,]

Or both at the same time:

ft[1:4, .(X)]

Memory

During the operations shown above, the actual data was never fully loaded from the file. That's because of fsttable's philosophy of keeping RAM usage as low as possible. Printing a few lines of a table doesn't require knowlegde of the remaining lines, so fsttable will never actualy load them.

Even when you create a new set:

ft2 <- ft[1:4, .(X)]

No actual data is being loaded into RAM. The copy still uses the original fst file to keep the data on-disk:

# small size because actual data is still on disk
object.size(ft2)


fstpackage/fsttable documentation built on Sept. 10, 2019, 9:18 p.m.