sergeant.caffeinated: Tools to Transform and Query Data with 'Apache' 'Drill'

Description Details Author(s) References

Description

Drill is an innovative low-latency distributed query engine designed to enable data exploration and analytics on both relational and non-relational datastores, scaling to petabytes of data. Users can query the data using standard SQL and BI tools without having to create and manage schemas. Some of the key features are:

Details

Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Drill is the "Drillbit" service which is responsible for accepting requests from the client, processing the queries, and returning results to the client.

You can install and run a drillbit service on one node or on many nodes to form a distributed cluster environment. When a drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes. Drill uses ZooKeeper to maintain cluster membership and health check information.

An RDBC interface with a thin set of 'dbplyr' helper functions is provided.

Author(s)

Bob Rudis (bob@rud.is)

References

Drill documentation


hrbrmstr/sergeant-caffeinated documentation built on Nov. 21, 2020, 9:40 p.m.