RJDBC Interface for Apache Drill
Apache Drill is a low-latency distributed query engine designed to enable data exploration and analytics on both relational and non-relational datastores, scaling to petabytes of data. An RDBC interface with a thin set of ’dbplyr` helper functions is provided.
The following functions are implemented:
be_quiet
: Silence Java reflection warningsdbConnect,DrillJDBCDriver-method
: Connect to a schema/tabledbGetRowsAffected,DrillJDBCResult-method
: Drill JDBC
dbGetRowsAffecteddbQuoteIdentifier,DrillJDBCConnection,character-method
: Thin
wrapper for dbQuoteIdentifierdbSendQuery,DrillJDBCConnection,character-method
: Thin wrapper for
dbSendQueryDrillJDBCDriver-class
: Create a Drill JDBC connectionsergeant-caffeinated-exports
: sergeant exported operatorssergeant.caffeinated
: Tools to Transform and Query Data with
‘Apache’ ‘Drill’sql_translate_env.DrillJDBCConnection
: Thin wrapper for
sql_translate_envremotes::install_github("hrbrmstr/sergeant-caffeinated")
library(sergeant.caffeinated)
# current version
packageVersion("sergeant.caffeinated")
## [1] '0.3.0'
library(tidyverse)
# use localhost if running standalone on same system otherwise the host or IP of your Drill server
test_host <- Sys.getenv("DRILL_TEST_HOST", "localhost")
be_quiet()
con <- dbConnect(drv = DrillJDBC(), sprintf("jdbc:drill:zk=%s", test_host))
db <- tbl(con, "cp.`employee.json`")
# without `collect()`:
db %>%
count(
gender,
marital_status
)
## # Source: lazy query [?? x 3]
## # Database: DrillJDBCConnection
## # Groups: gender
## gender marital_status n
## <chr> <chr> <dbl>
## 1 F S 297
## 2 M M 278
## 3 M S 276
## 4 F M 304
db %>%
count(
gender,
marital_status
) %>%
collect()
## # A tibble: 4 x 3
## # Groups: gender [2]
## gender marital_status n
## <chr> <chr> <dbl>
## 1 F S 297
## 2 M M 278
## 3 M S 276
## 4 F M 304
db %>%
group_by(position_title) %>%
count(gender) -> tmp2
group_by(db, position_title) %>%
count(gender) %>%
ungroup() %>%
mutate(
full_desc = ifelse(gender=="F", "Female", "Male")
) %>%
collect() %>%
select(
Title = position_title,
Gender = full_desc,
Count = n
)
## # A tibble: 30 x 3
## Title Gender Count
## <chr> <chr> <dbl>
## 1 President Female 1
## 2 VP Country Manager Male 3
## 3 VP Country Manager Female 3
## 4 VP Information Systems Female 1
## 5 VP Human Resources Female 1
## 6 Store Manager Female 13
## 7 VP Finance Male 1
## 8 Store Manager Male 11
## 9 HQ Marketing Female 2
## 10 HQ Information Systems Female 4
## # … with 20 more rows
arrange(db, desc(employee_id)) %>% print(n=20)
## # Source: table<cp.`employee.json`> [?? x 16]
## # Database: DrillJDBCConnection
## # Ordered by: desc(employee_id)
## employee_id full_name first_name last_name position_id position_title store_id department_id birth_date hire_date
## <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <chr>
## 1 1156 Kris Sta… Kris Stand 18 Store Tempora… 18 18 1914-02-02 1998-01-…
## 2 1155 Vivian B… Vivian Burnham 18 Store Tempora… 18 18 1914-02-02 1998-01-…
## 3 1154 Judy Doo… Judy Doolittle 18 Store Tempora… 18 18 1914-02-02 1998-01-…
## 4 1153 Gail Pir… Gail Pirnie 18 Store Tempora… 18 18 1914-02-02 1998-01-…
## 5 1152 Barbara … Barbara Younce 17 Store Permane… 18 17 1914-02-02 1998-01-…
## 6 1151 Burnis B… Burnis Biltoft 17 Store Permane… 18 17 1914-02-02 1998-01-…
## 7 1150 Foster D… Foster Detwiler 17 Store Permane… 18 17 1914-02-02 1998-01-…
## 8 1149 Bertha C… Bertha Ciruli 17 Store Permane… 18 17 1914-02-02 1998-01-…
## 9 1148 Sharon B… Sharon Bishop 16 Store Tempora… 18 16 1914-02-02 1998-01-…
## 10 1147 Jacqueli… Jacqueline Cutwright 16 Store Tempora… 18 16 1914-02-02 1998-01-…
## 11 1146 Elizabet… Elizabeth Anderson 16 Store Tempora… 18 16 1914-02-02 1998-01-…
## 12 1145 Michael … Michael Swartwood 16 Store Tempora… 18 16 1914-02-02 1998-01-…
## 13 1144 Shirley … Shirley Curtsing… 15 Store Permane… 18 15 1914-02-02 1998-01-…
## 14 1143 Ana Quick Ana Quick 15 Store Permane… 18 15 1914-02-02 1998-01-…
## 15 1142 Hazel So… Hazel Souza 15 Store Permane… 18 15 1914-02-02 1998-01-…
## 16 1141 James Co… James Compagno 15 Store Permane… 18 15 1914-02-02 1998-01-…
## 17 1140 Mona Jar… Mona Jaramillo 13 Store Shift S… 18 11 1961-09-24 1998-01-…
## 18 1139 Jeanette… Jeanette Belsey 12 Store Assista… 18 11 1972-05-12 1998-01-…
## 19 1138 James Ei… James Eichorn 18 Store Tempora… 12 18 1914-02-02 1998-01-…
## 20 1137 Heather … Heather Geiermann 18 Store Tempora… 12 18 1914-02-02 1998-01-…
## # … with more rows, and 6 more variables: salary <dbl>, supervisor_id <dbl>, education_level <chr>,
## # marital_status <chr>, gender <chr>, management_role <chr>
db %>%
mutate(
position_title = tolower(position_title),
salary = as.numeric(salary),
gender = ifelse(gender == "F", "Female", "Male"),
marital_status = ifelse(marital_status == "S", "Single", "Married")
) %>%
group_by(supervisor_id) %>%
summarise(
underlings_count = n()
) %>%
collect()
## # A tibble: 112 x 2
## supervisor_id underlings_count
## <dbl> <dbl>
## 1 0 1
## 2 1 7
## 3 5 9
## 4 4 2
## 5 2 3
## 6 20 2
## 7 21 4
## 8 22 7
## 9 6 4
## 10 36 2
## # … with 102 more rows
| Lang | # Files | (%) | LoC | (%) | Blank lines | (%) | # Lines | (%) | |:------|---------:|-----:|----:|-----:|------------:|-----:|---------:|-----:| | R | 5 | 0.23 | 161 | 0.21 | 31 | 0.20 | 114 | 0.37 | | XML | 1 | 0.05 | 66 | 0.09 | 0 | 0.00 | 0 | 0.00 | | Maven | 1 | 0.05 | 65 | 0.08 | 6 | 0.04 | 4 | 0.01 | | Rmd | 1 | 0.05 | 52 | 0.07 | 26 | 0.17 | 31 | 0.10 | | Java | 2 | 0.09 | 32 | 0.04 | 11 | 0.07 | 7 | 0.02 | | make | 1 | 0.05 | 10 | 0.01 | 4 | 0.03 | 0 | 0.00 | | SUM | 11 | 0.50 | 386 | 0.50 | 78 | 0.50 | 156 | 0.50 |
clock Package Metrics for sergeant.caffeinated
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.