sergeant: Tools to Transform and Query Data with 'Apache' 'Drill'

Description Details Author(s) References


Drill is an innovative low-latency distributed query engine designed to enable data exploration and analytics on both relational and non-relational datastores, scaling to petabytes of data. Users can query the data using standard SQL and BI tools without having to create and manage schemas. Some of the key features are:


Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Drill is the "Drillbit" service which is responsible for accepting requests from the client, processing the queries, and returning results to the client.

You can install and run a Drillbit service on one node or on many nodes to form a distributed cluster environment. When a Drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes. Drill uses ZooKeeper to maintain cluster membership and health check information.

Methods are provided to work with Drill via the REST APIs along with R DBI and dplyr interfaces.


Bob Rudis ([email protected])


Drill documentation

hrbrmstr/sergeant documentation built on Jan. 13, 2019, 9:55 p.m.