tbl.src_dbi: Use dplyr verbs with a remote database table

View source: R/src_dbi.R

tbl.src_dbiR Documentation

Use dplyr verbs with a remote database table

Description

All data manipulation on SQL tbls are lazy: they will not actually run the query or retrieve the data unless you ask for it: they all return a new tbl_dbi object. Use dplyr::compute() to run the query and save the results in a temporary table in the database, or use dplyr::collect() to retrieve the results to R. You can see the query with dplyr::show_query().

Usage

## S3 method for class 'src_dbi'
tbl(src, from, ...)

Arguments

src

A DBIConnection object produced by DBI::dbConnect().

from

Either a table identifier or a literal sql() string.

Use a string to identify a table in the current schema/catalog. We recommend using I() to identify a table outside the default catalog or schema, e.g. I("schema.table") or I("catalog.schema.table"). You can also use in_schema()/in_catalog() or DBI::Id().

...

Passed on to tbl_sql()

Details

For best performance, the database should have an index on the variables that you are grouping by. Use dplyr::explain() to check that the database is using the indexes that you expect.

There is one verb that is not lazy: dplyr::do() is eager because it must pull the data into R.

Examples

library(dplyr)

# Connect to a temporary in-memory SQLite database
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

# Add some data
copy_to(con, mtcars)
DBI::dbListTables(con)

# To retrieve a single table from a source, use `tbl()`
con %>% tbl("mtcars")

# Use `I()` for qualified table names
con %>% tbl(I("temp.mtcars")) %>% head(1)

# You can also use pass raw SQL if you want a more sophisticated query
con %>% tbl(sql("SELECT * FROM mtcars WHERE cyl = 8"))

# If you just want a temporary in-memory database, use src_memdb()
src2 <- src_memdb()

# To show off the full features of dplyr's database integration,
# we'll use the Lahman database. lahman_sqlite() takes care of
# creating the database.

if (requireNamespace("Lahman", quietly = TRUE)) {
batting <- copy_to(con, Lahman::Batting)
batting

# Basic data manipulation verbs work in the same way as with a tibble
batting %>% filter(yearID > 2005, G > 130)
batting %>% select(playerID:lgID)
batting %>% arrange(playerID, desc(yearID))
batting %>% summarise(G = mean(G), n = n())

# There are a few exceptions. For example, databases give integer results
# when dividing one integer by another. Multiply by 1 to fix the problem
batting %>%
  select(playerID:lgID, AB, R, G) %>%
  mutate(
   R_per_game1 = R / G,
   R_per_game2 = R * 1.0 / G
 )

# All operations are lazy: they don't do anything until you request the
# data, either by `print()`ing it (which shows the first ten rows),
# or by `collect()`ing the results locally.
system.time(recent <- filter(batting, yearID > 2010))
system.time(collect(recent))

# You can see the query that dplyr creates with show_query()
batting %>%
  filter(G > 0) %>%
  group_by(playerID) %>%
  summarise(n = n()) %>%
  show_query()
}

dbplyr documentation built on Sept. 10, 2025, 10:29 a.m.