RJDBC Interface for Apache Drill
Apache Drill is a low-latency distributed query engine designed to enable data exploration and analytics on both relational and non-relational datastores, scaling to petabytes of data. An RDBC interface with a thin set of ’dbplyr` helper functions is provided.
The following functions are implemented:
be_quiet: Silence Java reflection warningsdbConnect,DrillJDBCDriver-method: Connect to a schema/tabledbGetRowsAffected,DrillJDBCResult-method: Drill JDBC
    dbGetRowsAffecteddbQuoteIdentifier,DrillJDBCConnection,character-method: Thin
    wrapper for dbQuoteIdentifierdbSendQuery,DrillJDBCConnection,character-method: Thin wrapper for
    dbSendQueryDrillJDBCDriver-class: Create a Drill JDBC connectionsergeant-caffeinated-exports: sergeant exported operatorssergeant.caffeinated: Tools to Transform and Query Data with
    ‘Apache’ ‘Drill’sql_translate_env.DrillJDBCConnection: Thin wrapper for
    sql_translate_envremotes::install_github("hrbrmstr/sergeant-caffeinated")
library(sergeant.caffeinated)
# current version
packageVersion("sergeant.caffeinated")
## [1] '0.3.0'
library(tidyverse)
# use localhost if running standalone on same system otherwise the host or IP of your Drill server
test_host <- Sys.getenv("DRILL_TEST_HOST", "localhost")
be_quiet()
con <- dbConnect(drv = DrillJDBC(), sprintf("jdbc:drill:zk=%s", test_host))
db <- tbl(con, "cp.`employee.json`")
# without `collect()`:
db %>% 
  count(
    gender, 
    marital_status
  )
## # Source:   lazy query [?? x 3]
## # Database: DrillJDBCConnection
## # Groups:   gender
##   gender marital_status     n
##   <chr>  <chr>          <dbl>
## 1 F      S                297
## 2 M      M                278
## 3 M      S                276
## 4 F      M                304
db %>% 
  count(
    gender, 
    marital_status
  ) %>% 
  collect()
## # A tibble: 4 x 3
## # Groups:   gender [2]
##   gender marital_status     n
##   <chr>  <chr>          <dbl>
## 1 F      S                297
## 2 M      M                278
## 3 M      S                276
## 4 F      M                304
db %>% 
  group_by(position_title) %>% 
  count(gender) -> tmp2
group_by(db, position_title) %>% 
  count(gender) %>% 
  ungroup() %>% 
  mutate(
    full_desc = ifelse(gender=="F", "Female", "Male")
  ) %>% 
  collect() %>% 
  select(
    Title = position_title, 
    Gender = full_desc, 
    Count = n
  )
## # A tibble: 30 x 3
##    Title                  Gender Count
##    <chr>                  <chr>  <dbl>
##  1 President              Female     1
##  2 VP Country Manager     Male       3
##  3 VP Country Manager     Female     3
##  4 VP Information Systems Female     1
##  5 VP Human Resources     Female     1
##  6 Store Manager          Female    13
##  7 VP Finance             Male       1
##  8 Store Manager          Male      11
##  9 HQ Marketing           Female     2
## 10 HQ Information Systems Female     4
## # … with 20 more rows
arrange(db, desc(employee_id)) %>% print(n=20)
## # Source:     table<cp.`employee.json`> [?? x 16]
## # Database:   DrillJDBCConnection
## # Ordered by: desc(employee_id)
##    employee_id full_name first_name last_name position_id position_title store_id department_id birth_date hire_date
##          <dbl> <chr>     <chr>      <chr>           <dbl> <chr>             <dbl>         <dbl> <chr>      <chr>    
##  1        1156 Kris Sta… Kris       Stand              18 Store Tempora…       18            18 1914-02-02 1998-01-…
##  2        1155 Vivian B… Vivian     Burnham            18 Store Tempora…       18            18 1914-02-02 1998-01-…
##  3        1154 Judy Doo… Judy       Doolittle          18 Store Tempora…       18            18 1914-02-02 1998-01-…
##  4        1153 Gail Pir… Gail       Pirnie             18 Store Tempora…       18            18 1914-02-02 1998-01-…
##  5        1152 Barbara … Barbara    Younce             17 Store Permane…       18            17 1914-02-02 1998-01-…
##  6        1151 Burnis B… Burnis     Biltoft            17 Store Permane…       18            17 1914-02-02 1998-01-…
##  7        1150 Foster D… Foster     Detwiler           17 Store Permane…       18            17 1914-02-02 1998-01-…
##  8        1149 Bertha C… Bertha     Ciruli             17 Store Permane…       18            17 1914-02-02 1998-01-…
##  9        1148 Sharon B… Sharon     Bishop             16 Store Tempora…       18            16 1914-02-02 1998-01-…
## 10        1147 Jacqueli… Jacqueline Cutwright          16 Store Tempora…       18            16 1914-02-02 1998-01-…
## 11        1146 Elizabet… Elizabeth  Anderson           16 Store Tempora…       18            16 1914-02-02 1998-01-…
## 12        1145 Michael … Michael    Swartwood          16 Store Tempora…       18            16 1914-02-02 1998-01-…
## 13        1144 Shirley … Shirley    Curtsing…          15 Store Permane…       18            15 1914-02-02 1998-01-…
## 14        1143 Ana Quick Ana        Quick              15 Store Permane…       18            15 1914-02-02 1998-01-…
## 15        1142 Hazel So… Hazel      Souza              15 Store Permane…       18            15 1914-02-02 1998-01-…
## 16        1141 James Co… James      Compagno           15 Store Permane…       18            15 1914-02-02 1998-01-…
## 17        1140 Mona Jar… Mona       Jaramillo          13 Store Shift S…       18            11 1961-09-24 1998-01-…
## 18        1139 Jeanette… Jeanette   Belsey             12 Store Assista…       18            11 1972-05-12 1998-01-…
## 19        1138 James Ei… James      Eichorn            18 Store Tempora…       12            18 1914-02-02 1998-01-…
## 20        1137 Heather … Heather    Geiermann          18 Store Tempora…       12            18 1914-02-02 1998-01-…
## # … with more rows, and 6 more variables: salary <dbl>, supervisor_id <dbl>, education_level <chr>,
## #   marital_status <chr>, gender <chr>, management_role <chr>
db %>% 
  mutate(
    position_title = tolower(position_title),
    salary = as.numeric(salary),
    gender = ifelse(gender == "F", "Female", "Male"),
    marital_status = ifelse(marital_status == "S", "Single", "Married")
  ) %>%
  group_by(supervisor_id) %>% 
  summarise(
    underlings_count = n()
  ) %>% 
  collect()
## # A tibble: 112 x 2
##    supervisor_id underlings_count
##            <dbl>            <dbl>
##  1             0                1
##  2             1                7
##  3             5                9
##  4             4                2
##  5             2                3
##  6            20                2
##  7            21                4
##  8            22                7
##  9             6                4
## 10            36                2
## # … with 102 more rows
| Lang | # Files | (%) | LoC | (%) | Blank lines | (%) | # Lines | (%) | |:------|---------:|-----:|----:|-----:|------------:|-----:|---------:|-----:| | R | 5 | 0.23 | 161 | 0.21 | 31 | 0.20 | 114 | 0.37 | | XML | 1 | 0.05 | 66 | 0.09 | 0 | 0.00 | 0 | 0.00 | | Maven | 1 | 0.05 | 65 | 0.08 | 6 | 0.04 | 4 | 0.01 | | Rmd | 1 | 0.05 | 52 | 0.07 | 26 | 0.17 | 31 | 0.10 | | Java | 2 | 0.09 | 32 | 0.04 | 11 | 0.07 | 7 | 0.02 | | make | 1 | 0.05 | 10 | 0.01 | 4 | 0.03 | 0 | 0.00 | | SUM | 11 | 0.50 | 386 | 0.50 | 78 | 0.50 | 156 | 0.50 |
clock Package Metrics for sergeant.caffeinated
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.