sergeant.caffeinated: Tools to Transform and Query Data with 'Apache' 'Drill'
In hrbrmstr/sergeant-caffeinated: RJDBC Interface for Apache Drill

Description Details Author(s) References

Drill is an innovative low-latency distributed query engine designed to enable data exploration and analytics on both relational and non-relational datastores, scaling to petabytes of data. Users can query the data using standard SQL and BI tools without having to create and manage schemas. Some of the key features are:

Schema-free JSON document model similar to MongoDB and Elasticsearch
Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
Extremely user and developer friendly
Pluggable architecture enables connectivity to multiple datastores

Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Drill is the "Drillbit" service which is responsible for accepting requests from the client, processing the queries, and returning results to the client.

You can install and run a drillbit service on one node or on many nodes to form a distributed cluster environment. When a drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes. Drill uses ZooKeeper to maintain cluster membership and health check information.

An RDBC interface with a thin set of 'dbplyr' helper functions is provided.

Bob Rudis (bob@rud.is)

Drill documentation

hrbrmstr/sergeant-caffeinated documentation built on Nov. 21, 2020, 9:40 p.m.