src_drill: Connect to Drill (dplyr)

Description Usage Arguments Note See Also Examples

View source: R/dplyr.r

Description

Use src_drill() to connect to a Drill cluster and tbl() to connect to a fully-qualified "table reference". The vast majority of Drill SQL functions have also been made available to the dplyr interface. If you have custom Drill SQL functions that need to be implemented please file an issue on GitHub.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
src_drill(
  host = Sys.getenv("DRILL_HOST", "localhost"),
  port = as.integer(Sys.getenv("DRILL_PORT", 8047L)),
  ssl = FALSE,
  username = NULL,
  password = NULL
)

## S3 method for class 'src_drill'
tbl(src, from, ...)

Arguments

host

Drill host (will pick up the value from DRILL_HOST env var)

port

Drill port (will pick up the value from DRILL_PORT env var)

ssl

use ssl?

username, password

if not NULL the credentials for the Drill service.

src

A Drill "src" created with src_drill()

from

A Drill view or table specification

...

Extra parameters

Note

This is a DBI wrapper around the Drill REST API.

See Also

Other Drill REST API (dplyr): drill_custom_functions, src_tbls.src_drill()

Other Drill REST API (dplyr): drill_custom_functions, src_tbls.src_drill()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
try({
db <- src_drill("localhost", 8047L)

print(db)
## src:  DrillConnection
## tbls: INFORMATION_SCHEMA, cp.default, dfs.default, dfs.root, dfs.tmp, sys

emp <- tbl(db, "cp.`employee.json`")

count(emp, gender, marital_status)
## # Source:   lazy query [?? x 3]
## # Database: DrillConnection
## # Groups:   gender
##   marital_status gender     n
##            <chr>  <chr> <int>
## 1              S      F   297
## 2              M      M   278
## 3              S      M   276

# Drill-specific SQL functions are also available
select(emp, full_name) %>%
  mutate(        loc = strpos(full_name, "a"),
         first_three = substr(full_name, 1L, 3L),
                 len = length(full_name),
                  rx = regexp_replace(full_name, "[aeiouAEIOU]", "*"),
                 rnd = rand(),
                 pos = position("en", full_name),
                 rpd = rpad(full_name, 20L),
                rpdw = rpad_with(full_name, 20L, "*"))
## # Source:   lazy query [?? x 9]
## # Database: DrillConnection
##      loc         full_name   len                 rpdw   pos                rx
##    <int>             <chr> <int>                <chr> <int>             <chr>
##  1     0      Sheri Nowmer    12 Sheri Nowmer********     0      Sh*r* N*wm*r
##  2     0   Derrick Whelply    15 Derrick Whelply*****     0   D*rr*ck Wh*lply
##  3     5    Michael Spence    14 Michael Spence******    11    M*ch**l Sp*nc*
##  4     2    Maya Gutierrez    14 Maya Gutierrez******     0    M*y* G*t**rr*z
##  5     7   Roberta Damstra    15 Roberta Damstra*****     0   R*b*rt* D*mstr*
##  6     7  Rebecca Kanagaki    16 Rebecca Kanagaki****     0  R*b*cc* K*n*g*k*
##  7     0       Kim Brunner    11 Kim Brunner*********     0       K*m Br*nn*r
##  8     6   Brenda Blumberg    15 Brenda Blumberg*****     3   Br*nd* Bl*mb*rg
##  9     2      Darren Stanz    12 Darren Stanz********     5      D*rr*n St*nz
## 10     4 Jonathan Murraiin    17 Jonathan Murraiin***     0 J*n*th*n M*rr***n
## # ... with more rows, and 3 more variables: rpd <chr>, rnd <dbl>, first_three <chr>
}, silent=TRUE)

Example output

Loading required package: DBI
Loading required package: dplyr

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Loading required package: dbplyr

Attaching package: 'dbplyr'

The following objects are masked from 'package:dplyr':

    ident, sql

sergeant documentation built on Nov. 30, 2021, 1:06 a.m.