new_prt: Methods for creating and inspecting prt objects

View source: R/prt.R

new_prtR Documentation

Methods for creating and inspecting prt objects

Description

The constructor new_prt() creates a prt object from one or several fst files, making sure that each table consist of identically named, ordered and typed columns. In order to create a prt object from an in-memory table, as_prt() coerces objects inheriting from data.frame to prt by first splitting rows into n_chunks, writing fst files to the directory dir and calling new_prt() on the resulting fst files. If this default splitting of rows (which might impact efficiency of subsequent queries on the data) is not optimal, a list of objects inheriting from data.frame is a valid x argument as well.

Usage

new_prt(files)

as_prt(x, n_chunks = NULL, dir = tempfile())

is_prt(x)

n_part(x)

part_nrow(x)

## S3 method for class 'prt'
head(x, n = 6L, ...)

## S3 method for class 'prt'
tail(x, n = 6L, ...)

## S3 method for class 'prt'
as.data.table(x, ...)

## S3 method for class 'prt'
as.list(x, ...)

## S3 method for class 'prt'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)

## S3 method for class 'prt'
as.matrix(x, ...)

Arguments

files

Character vector of file name(s).

x

A prt object.

n_chunks

Count variable specifying the number of chunks x is split into.

dir

Directory where the chunked fst::fst() objects reside in.

n

Count variable indicating the number of rows to return.

...

Generic consistency: additional arguments are ignored and a warning is issued.

row.names, optional

Generic consistency: passing anything other than the default value issues a warning.

Details

To check whether an object inherits from prt, the function is_prt() is exported, the number of partitions can be queried by calling n_part() and the number of rows per partition is available as part_nrow().

The base R S3 generic functions dim(), length(), dimnames() and names(),have prt-specific implementations, where dim() returns the overall table dimensions, length() is synonymous for ncol(), dimnames() returns a length 2 list containing NULL column names as character vector and names() is synonymous for colnames(). Both setting and getting row names on prt objects is not supported and more generally, calling replacement functions such as ⁠names<-()⁠ or ⁠dimnames<-()⁠ leads to an error, as prt objects are immutable. The base R S3 generic functions head() and tail() are available as well and are used internally to provide an extensible mechanism for printing (see format_dt()).

Coercion to other base R objects is possible via as.list(), as.data.frame() and as.matrix() and for coercion to data.table, its generic function data.table::as.data.table() is available to prt objects. All coercion involves reading the full data into memory at once which might be problematic in cases of large data sets.

Examples

cars <- as_prt(mtcars, n_chunks = 2L)

is_prt(cars)
n_part(cars)
part_nrow(cars)

nrow(cars)
ncol(cars)

colnames(cars)
names(cars)

head(cars)
tail(cars, n = 2)

str(as.list(cars))
str(as.data.frame(cars))


prt documentation built on April 9, 2023, 5:07 p.m.