The dbcooper package turns a database connection into a collection of functions, handling logic for keeping track of connections and letting you take advantage of autocompletion when exploring a database.
It's especially helpful to use when authoring database-specific R packages, for instance in an internal company package or one wrapping a public data source.
The package's name is a reference to the bandit D.B. Cooper.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("pipeline-tools/dbcooper")
The dbcooper package asks you to create the connection first. As an example, we'll use the Lahman baseball database packaged with dbplyr.
library(dplyr)
lahman_db <- dbplyr::lahman_sqlite()
lahman_db
#> <SQLiteConnection>
#> Path: /tmp/RtmpeApR4v/lahman.sqlite
#> Extensions: TRUE
You set up dbcooper with the dbc_init
function, passing it a prefix lahman
that will apply to all the functions it creates.
library(dbcooper)
dbc_init(lahman_db, "lahman")
dbc_init
then creates user-friendly accessor functions in your global environment. (You could also pass it an environment in which the functions will be created).
dbc_init
adds several functions when it initializes a database source. In this case, each will start with the lahman_
prefix.
_list
: Get a list of tables_tbl
: Access a table that can be worked with in dbplyr_query
: Perform of a SQL query and work with the result_execute
: Execute a query (such as a CREATE
or DROP
)_src
: Retrieve a dbi_src
for the databaseFor instance, we could start by finding the names of the tables in the Lahman database.
lahman_list()
#> [1] "AllstarFull" "Appearances" "AwardsManagers" "AwardsPlayers" "AwardsShareManagers" "AwardsSharePlayers" "Batting" "BattingPost"
#> [9] "CollegePlaying" "Fielding" "FieldingOF" "FieldingOFsplit" "FieldingPost" "HallOfFame" "HomeGames" "LahmanData"
#> [17] "Managers" "ManagersHalf" "Parks" "People" "Pitching" "PitchingPost" "Salaries" "Schools"
#> [25] "SeriesPost" "Teams" "TeamsFranchises" "TeamsHalf" "sqlite_stat1" "sqlite_stat4"
We can access one of these tables with lahman_tbl()
, then put it through any kind of dplyr operation.
lahman_tbl("Batting")
#> # Source: SQL [?? x 22]
#> # Database: sqlite 3.39.2 [/tmp/RtmpeApR4v/lahman.sqlite]
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP
#> <chr> <int> <int> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 0 0 0 NA NA NA NA 0
#> 2 addybo01 1871 1 RC1 NA 25 118 30 32 6 0 0 13 8 1 4 0 NA NA NA NA 0
#> 3 allisar01 1871 1 CL1 NA 29 137 28 40 4 5 0 19 3 1 2 5 NA NA NA NA 1
#> 4 allisdo01 1871 1 WS3 NA 27 133 28 44 10 2 2 27 1 1 0 2 NA NA NA NA 0
#> 5 ansonca01 1871 1 RC1 NA 25 120 29 39 11 3 0 16 6 2 2 1 NA NA NA NA 0
#> 6 armstbo01 1871 1 FW1 NA 12 49 9 11 2 1 0 5 0 1 0 1 NA NA NA NA 0
#> 7 barkeal01 1871 1 RC1 NA 1 4 0 1 0 0 0 2 0 0 1 0 NA NA NA NA 0
#> 8 barnero01 1871 1 BS1 NA 31 157 66 63 10 9 0 34 11 6 13 1 NA NA NA NA 1
#> 9 barrebi01 1871 1 FW1 NA 1 5 1 1 1 0 0 1 0 0 0 0 NA NA NA NA 0
#> 10 barrofr01 1871 1 BS1 NA 18 86 13 13 2 1 0 11 1 0 0 0 NA NA NA NA 0
#> # … with more rows
lahman_tbl("Batting") %>%
count(teamID, sort = TRUE)
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.39.2 [/tmp/RtmpeApR4v/lahman.sqlite]
#> # Ordered by: desc(n)
#> teamID n
#> <chr> <int>
#> 1 CHN 5129
#> 2 PHI 5026
#> 3 PIT 4984
#> 4 SLN 4904
#> 5 CIN 4786
#> 6 CLE 4731
#> 7 BOS 4571
#> 8 NYA 4530
#> 9 CHA 4523
#> 10 DET 4462
#> # … with more rows
If we'd rather write SQL than dplyr, we could also run lahman_query()
(which can also take a filename).
lahman_query("SELECT
playerID,
sum(AB) as AB
FROM Batting
GROUP BY playerID")
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.39.2 [/tmp/RtmpeApR4v/lahman.sqlite]
#> playerID AB
#> <chr> <int>
#> 1 aardsda01 4
#> 2 aaronha01 12364
#> 3 aaronto01 944
#> 4 aasedo01 5
#> 5 abadan01 21
#> 6 abadfe01 9
#> 7 abadijo01 49
#> 8 abbated01 3044
#> 9 abbeybe01 225
#> 10 abbeych01 1756
#> # … with more rows
Finally, lahman_execute()
is for commands like CREATE
and DROP
that don't return a table, but rather execute a command on the database.
lahman_execute("CREATE TABLE Players AS
SELECT playerID, SUM(AB) AS AB
FROM Batting
GROUP BY playerID")
#> [1] 0
lahman_tbl("Players")
#> # Source: SQL [?? x 2]
#> # Database: sqlite 3.39.2 [/tmp/RtmpeApR4v/lahman.sqlite]
#> playerID AB
#> <chr> <int>
#> 1 aardsda01 4
#> 2 aaronha01 12364
#> 3 aaronto01 944
#> 4 aasedo01 5
#> 5 abadan01 21
#> 6 abadfe01 9
#> 7 abadijo01 49
#> 8 abbated01 3044
#> 9 abbeybe01 225
#> 10 abbeych01 1756
#> # … with more rows
lahman_execute("DROP TABLE Players")
#> [1] 0
Besides the _list
, _tbl
, _query
, and _execute
functions, the package also creates auto-completed table accessors.
# Same result as lahman_tbl("Batting")
lahman_batting()
#> # Source: SQL [?? x 22]
#> # Database: sqlite 3.39.2 [/tmp/RtmpeApR4v/lahman.sqlite]
#> playerID yearID stint teamID lgID G AB R H X2B X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP
#> <chr> <int> <int> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 abercda01 1871 1 TRO NA 1 4 0 0 0 0 0 0 0 0 0 0 NA NA NA NA 0
#> 2 addybo01 1871 1 RC1 NA 25 118 30 32 6 0 0 13 8 1 4 0 NA NA NA NA 0
#> 3 allisar01 1871 1 CL1 NA 29 137 28 40 4 5 0 19 3 1 2 5 NA NA NA NA 1
#> 4 allisdo01 1871 1 WS3 NA 27 133 28 44 10 2 2 27 1 1 0 2 NA NA NA NA 0
#> 5 ansonca01 1871 1 RC1 NA 25 120 29 39 11 3 0 16 6 2 2 1 NA NA NA NA 0
#> 6 armstbo01 1871 1 FW1 NA 12 49 9 11 2 1 0 5 0 1 0 1 NA NA NA NA 0
#> 7 barkeal01 1871 1 RC1 NA 1 4 0 1 0 0 0 2 0 0 1 0 NA NA NA NA 0
#> 8 barnero01 1871 1 BS1 NA 31 157 66 63 10 9 0 34 11 6 13 1 NA NA NA NA 1
#> 9 barrebi01 1871 1 FW1 NA 1 5 1 1 1 0 0 1 0 0 0 0 NA NA NA NA 0
#> 10 barrofr01 1871 1 BS1 NA 18 86 13 13 2 1 0 11 1 0 0 0 NA NA NA NA 0
#> # … with more rows
# Same result as lahman_tbl("Managers") %>% count()
lahman_managers() %>%
count()
#> # Source: SQL [1 x 1]
#> # Database: sqlite 3.39.2 [/tmp/RtmpeApR4v/lahman.sqlite]
#> n
#> <int>
#> 1 3684
These are useful because they let you use auto-complete to complete table names as you're exploring a data source.
Please note that the 'dbcooper' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.