README.md

dplyr-calcite

dplyrcalcite is a Database connector for Apache Calcite for dplyr the next iteration of plyr (from Hadley Wickham), focussed on tools for working with data frames (hence the d in the name).

Installing dependencies

The interface to Apache Calcite is driven by the Calcite JDBC driver. This will require the driver to be installed, and one of the easiest ways to achieve this is by following the installation instructions provided here: http://calcite.apache.org/docs/howto.html and: http://calcite.apache.org/docs/tutorial.html With SQL reference here: http://calcite.apache.org/docs/reference.html

Basically :-

$ git clone https://github.com/apache/calcite.git
$ cd calcite
$ mvn install -DskipTests -Dcheckstyle.skip=true

The examples provided in the data directory are dependent on the sample CSV file driver implementation.

Install dependent R packages

install RJDBC and assertthat with:

next install lazyeval with:

next install dplyr with:

To get started, read the notes below, then read the help(src_calcite).

If you encounter a clear bug, please file a minimal reproducible example on github.

src_calcite

Connect to the Database:

library(dplyrcalcite)

# optionally set the class path for the Calcite JDBC connector - alternatively use the CLASSPATH environment variable
options(dplyr.jdbc.classpath = "~/.m2/repository")

# To connect to a database first create a src:
lhm <- src_calcite('./data/model.json')
lhm

# Simple query:
batting <- tbl(lhm, "Batting")
dim(batting)
colnames(batting)
head(batting)

See dplyr for many more examples.



piersharding/dplyr-calcite documentation built on May 25, 2019, 6:10 a.m.