rquery-package: 'rquery': Relational Query Generator for Data Manipulation

rquery-packageR Documentation

rquery: Relational Query Generator for Data Manipulation

Description

rquery supplies a piped query generator based on Edgar F. Codd's relational algebra and operator names (plus experience using SQL and dplyr at big data scale). The design represents an attempt to make SQL more teachable by denoting composition a sequential pipeline notation instead of nested queries or functions. Package features include: data processing trees or pipelines as observable objects (able to report both columns produced and columns used), optimized SQL generation as an explicit user visible modeling step, and convenience methods for applying query trees to in-memory data.frames.

Details

Note: rquery is a "database first" design. This means choices are made that favor database implementation. These include: capturing the entire calculation prior to doing any work (and using recursive methods to inspect this object, which can limit the calculation depth to under 1000 steps at a time), preferring "tame column names" (which isn't a bad idea in 'R' anyway as columns and variables are often seen as cousins), and not preserving row or column order (or supporting numeric column indexing). Also, rquery does have a fast in-memory implementation: rqdatatable (thanks to the data.table, so one can in fact use 'rquery' without a database.

Author(s)

Maintainer: John Mount jmount@win-vector.com

Other contributors:

  • Win-Vector LLC [copyright holder]

See Also

Useful links:


rquery documentation built on Aug. 20, 2023, 9:06 a.m.