New support for Snowflake (@edgararuiz)
sql_query_wrap() now work with
schemas (@mgirlich, #595).
if_all() are now translated.
group_by() now ungroups when the dots argument is empty and
sql_escape_datetime gain methods for MS Access
Thanks to @mgirlich, dbplyr gains support for key verbs from tidyr:
@mgirlich is now a dbplyr author in recognition of his significant and sustained contributions.
across() implementation has been rewritten to support more inputs:
it now translates formulas (#525), works with SQL functions that don't have
R translations (#534), and work with
summarise() now supports argument
.groups (@mgirlich, #584).
substring() get better
translations (#577). Most importantly, the results of using negative
locations should match the underlying R implementations more closely.
as.integer64() translations cast first to
to avoid CASTing weirdness (@DavidPatShuiFong, #496).
Assumes a boolean context inside of
end = -1 now works (#577).
lead() lose the
default parameter since it's
not supported (@hdplsa, #548).
SQLite: custom translation of
RPostgreSQL backend warns if
temporary = TRUE since temporary tables are
not supported by
count() method provides closer match to dplyr semantics (#347).
distinct() now respects grouping (@mgirlich, #535).
db_connection_describe() no longer uses partial matching (@mgirlich, #564).
pull() no longer
select()s the result when there's already only
one variable (#562).
select() no longer relocates grouping variables to the front
(@mgirlich, #568). and informs when adding missing grouping variables
tbl.src_dbi(...) now passed on to
across() is now translated into individual SQL statements (#480).
select() support dplyr 1.0.0 tidyselect syntax (apart from
predicate functions which can't easily work on computed queries) (#502).
relocate() makes it easy to move columns (#494) and
it easy to rename columns programmatically (#502).
slice_order() are now supported.
slice_tail() throw clear error messages (#394)
Documentation has been radically improved with new topics for each major verb and each backend giving more details about the SQL translation.
setdiff() gain an
all argument to add the
ALL argument (#414).
Join functions gains a
na_matches argument that allows you to control
whether or not
NULL) values match other
NA values. The default is
"never", which is the usual behaviour in databases. You can set
na_matches = "na" to match R's usual join behaviour (#180). Additional
arguments error (instead of being silently swallowed) (#382).
Joins now only use aliases where needed to disambiguate columns; this should make generated queries more readable.
Subqueries no longer include an
ORDER BY clause. This is not part of the
SQL spec, and has very limited support across databases. Now such queries
generate a warning suggesting that you move your
arrange() call later in
the pipeline (#276). (There's one exception:
ORDER BY is still generated
LIMIT is present; this tends to affect the returns rows but not
necessarily their order).
Subquery names are now scoped within the query. This makes query text deterministic which helps some query optimisers/cachers (#336).
sql_optimise() now can partially optimise a pipeline; due to an unfortunate
bug it previously gave up too easily.
in_schema() quotes each input individually (#287) (use
sql() to opt out
of quoting, if needed). And
DBI::Id() should work anywhere that
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
You can now use
:: in translations, so that (e.g.)
[[ can now also translate numeric indices (#520).
%/% now generates a clear error message; previously it was translated to
/ which is not correct (#108).
n() is translated to
count(*) instead of
sub_str() translation is more consistent in edge cases (@ianmcook).
median() (@lorenzwalthert, #483),
var() functions have an
na.rm argument that warns once when not
TRUE. This makes them consistent with
substring() is now translated the same way as
blob vectors can now be used with
!!! operators, for example in
filter() (@okhoma, #433)
MySQL uses standard SQL for index creation.
MS SQL translation does better a distinguishing between bit and boolean
ifelse once again generate
as.*() function uses
CAST() for version 11+ (2012+) (@DavidPatShuiFong, #380).
odbc no longer translates
count(); this was an accidental inclusion.
Oracle translation now depends on Oracle 12c, and uses a "row-limiting"
head(). It gains translations for
as.Date() translation (@rlh1994, #267).
PostgreSQL: new translations for lubridate period functions
floor_date() (@bkkkk, #333) and stringr functions
New RedShift translations when used with
str_replace() errors since there's no Redshift translation,
as.double() cast to
SQLite gains translations for lubridate functions
(#262), and correct translation for
If you are the author of a dbplyr backend, please see
vignette("backend-2") for details.
dbplyr_edition() generic allows you to opt-in to the 2nd edition of
the dbplyr API.
db_write_table() now calls
DBI::dbWriteTable() instead of nine generics
that formerly each did a small part:
db_data_types(). You can
now delete the methods for these generics.
db_query_rows() is no longer used; it appears that it hasn't been used
for some time, so if you have a method, you can delete it.
DBI::dbQuoteIdentifier() is now used instead of
DBI::dbQuoteString() instead of
A number of
db_* generics have been replaced with new SQL generation
This makes them easier to test and is an important part of the process of moving all database generics in dbplyr (#284).
A number of other generics have been renamed to facilitate the move from dplyr to dbplyr:
db_temporary_table() generic makes it easier to work with databases
that require temporary tables to be specially named.
sql_expr_matches() generic allows databases to use more efficient
alternatives when determine if two values "match" (i.e. like equality but
a pair of
NULLs will also match). For more details, see
sql_join_suffix() allows backends to control the default suffixes
All old lazy eval shims have been removed. These have been deprecated for some time.
Date-time escaping methods for Athena and Presto have moved to the packages where they belong.
Attempting to embed a Shiny reactive in a query now gives a helpful error (#439).
copy_nycflights13() (and hence
and friends now return DBI connections rather than the now deprecated
copy_to() can now
overwrite when table is specified with schema (#489),
and gains an
in_transaction argument used to optionally suppress the
transaction wrapper (#368).
distinct() no longer duplicates column if grouped (#354).
transmute() now correctly tracks variables it needs when creating
mutate() grouping variables no longer generates a downstream error (#396)
mutate() correctly generates subqueries when you re-use the same variable
three or more times (#412).
window_order() overrides ordering, rather than appending to it.
DBI::dbExecute() now uses
immediate = TRUE; this improves
support for session-scoped temporary tables in MS SQL (@krlmlr, #438).
ORDER BY use
TOP 9223372036854775807 instead of
TOP 100 PERCENT on SQL Server for compatibility with Azure Data Warehouse
escape() now supports
blob vectors using new
generic. It enables using blob variables in
dplyr verbs, for example to filter nvarchar values by UTF-16 blobs
setOldClass() calls for
"ident_q" classes for
compatibility with dplyr 1.0.0 (#448, @krlmlr).
str_detect() translation uses same argument names as stringr,
and gains a
negate argument (#444).
anti_join() now correctly support the
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of RMySQL) for its internal tests and data functions (#427).
The Date and POSIXt methods for
escape() now use exported
sql_escape_datetime() generics to allow backend
specific formatting of date and datetime literals. These are used to
provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391).
lag() now respect the
window_frame() (@krlmlr, #366).
SQL server: new translations for
str_flatten() (@PauloJhonny, #405).
SQL server: temporary datasets are now session-local, not global (#401).
translation (@shosaco, #362).
Fix bug when partially evaluating unquoting quosure containing a single symbol (#317)
Fixes for rlang and dpylr compatibility.
Minor improvements to SQL generation
x %in% y strips names of
Enhancements for scoped verbs (
filter_at() etc) (#296, #306).
MS SQL use
TOP 100 PERCENT as stop-gap to allow subqueries with
ORDER BY (#277).
Window functions now translated correctly for Hive (#293, @cderv).
Error: `con` must not be NULL: If you see this error, it probably means
that you have forgotten to pass
con down to a dbplyr function.
Previously, dbplyr defaulted to using
simulate_dbi() which introduced
subtle escaping bugs. (It's also possible I have forgotten to pass it
somewhere that the dbplyr tests don't pick up, so if you can't figure it
out, please let me know).
[) functions are no longer evaluated locally.
This makes the translation more consistent and enables useful new idioms
for modern databases (#200).
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/) and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain support for window functions, available in Maria DB 10.2, MySQL 8.0, and SQLite 3.25 (#191).
Overall, dplyr generates many fewer subqueries:
Joins and semi-joins no longer add an unneeded subquery (#236). This is
facilitated by the new
bare_identifier_ok argument to
the previous argument was called
root and confused me.
Many sequences of
be collapsed into a single query, instead of always generating a subquery
vignette("sql") describes some advantages of dbplyr over SQL (#205) and
gives some advice about writing literal SQL inside of dplyr, when you need
vignette("reprex") gives some hints on creating reprexes that work
anywhere (#117). This is supported by a new
tbl_memdb() that matches the
..._join() functions gain an
sql_on argument that allows specifying
arbitrary join predicates in SQL code (#146, @krlmlr).
New translations for some lubridate functions:
yday() (@colearendt, @derekmorr). Also added new
New translations for stringr functions:
(@colearendt). Non-translated stringr functions throw a clear error.
New translations for bitwise operations:
bitwShiftR(). Unlike the base R functions,
the translations do not coerce arguments to integers (@davidchall, #235).
New translation for
CASE WHEN y THEN x END. This enables
sum(a[b == 0]) to work as you expect from R (#202).
y needs to be
a logical expression; if not you will likely get a type error from your
New translations for
x.y, enabling you to index
into nested fields in databases that provide them (#158).
.env pronouns of tidy evaluation are correctly translated
New translation for
quantile(). Works for all ANSI compliant
databases (SQL Server, Postgres, MariaDB, Teradata) and has custom
translations for Hive. Thanks to @edavidaja for researching the SQL variants!
na_if() is correct translated to
NULLIF() (rather than
n_distinct() translation throws an error when given more than one argument.
New default translations for
paste0(), and the hyperbolic
functions (these previously were only available for ODBC databases).
Corrected translations of
for ANSI compliant databases (#118), to
MAX() for SQLite, and
to an error for SQL server.
New translation for
switch() to the simple form of
CASE WHEN (#192).
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
SQL simulation has been overhauled. It now works reliably, is better
documented, and always uses ANSI escaping (i.e.
` for field
' for strings).
tbl_lazy() now actually puts a
dbplyr::src in the
$src field. This
shouldn't affect any downstream code unless you were previously working
around this weird difference between
It also includes the
src class in its class, and when printed,
shows the generated SQL (#111).
Translations also applied to connections via the odbc package (@colearendt, #238)
Basic support for regular expressions via
str_replace_all() (@colearendt, #168).
Improved translation for
IF(x, TRUE, FALSE).
New custom translation for
paste0() (@cderv, #221)
Basic support for regular expressions via
str_replace_all() (@colearendt, #168).
explain() translation now generates
EXPLAIN QUERY PLAN which
generates a higher-level, more human friendly explanation.
Improved translation for
CAST(x as BIT) (#250).
copy_to() method applies temporary table name transformation
earlier so that you can now overwrite temporary tables (#258).
db_write_table() method uses correct argument name for
passing along field types (#251).
Aggregation functions only warn once per session about the use of
na.rm = TRUE (#216).
table names generated by
random_table_name() have the prefix
"dbplyr_", which makes it easier to find them programmatically
Functions that are only available in a windowed (
mutate()) query now
throw an error when called in a aggregate (
summarise()) query (#129)
arrange() understands the
.by_group argument, making it possible
sort by groups if desired. The default is
distinct() now handles computed variables like
distinct(df, y = x + y)
build_sql() no longer accept
con = NULL as
a shortcut for
con = simulate_dbi(). This made it too easy to forget to
con along, introducing extremely subtle escaping bugs.
con argument for the same reason.
escape_ansi() always uses ANSI SQL 92 standard escaping (for use
in examples and documentation).
mutate(df, x = NULL) drops
x from the output, just like when working with
local data frames (#194).
partial_eval() processes inlined functions (including rlang lambda
functions). This makes dbplyr work with more forms of scoped verbs like
df %>% summarise_all(~ mean(.)),
df %>% summarise_all(list(mean)) (#134).
sql_aggregate() now takes an optional argument
f_r for passing to
check_na_rm(). This allows the warning to show the R function name rather
than the SQL function name (@sverchkov, #153).
sql_infix() gains a
pad argument for the rare operator that doesn't
need to be surrounded by spaces.
sql_prefix() no longer turns SQL functions into uppercase, allowing for
correct translation of case-sensitive SQL functions (#181, @mtoto).
summarise() gives a clear error message if you refer to a variable
created in that same
sql_call2() which is to
sql_expr() is to
cat() rather than message.
intersect() do a better job
of matching columns across backends (#183).
Calls of the form
dplyr::foo() are now evaluated in the database,
rather than locally (#197).
vars argument to
tbl_sql() has been formally deprecated; it hasn't
actually done anything for a while (#3254).
tbl objects now include a class generated from the class of
the underlying connection object. This makes it possible for dplyr backends
to implement different behaviour at the dplyr level, when needed. (#2293)
x %in% y is now translated to
y is empty (@mgirlich, #160).
as.integer64(x) translation to
CAST(x AS BIGINT) (#3305)
case_when now translates with a ELSE clause if a formula of the form
TRUE~<RHS> is provided . (@cderv, #112)
cummean() now generates
str_detect() now uses correct parameter order (#3397)
CASE WHENinstead of
IIF; this allows more complex operations, such as
%in%, to work properly (#93)
db_drop_table()now only drops tables if they exist (#3306)
Correct translation for
as.double() (@chris-park, #171).
substr()translation improved (#3339)
copy_to() will only remove existing table when
overwrite = TRUE and the
table already exists, eliminating a confusing "NOTICE" from PostgreSQL
partial_eval() handles unevaluated formulas (#184).
pull.tbl_sql() now extracts correctly from grouped tables (#3562).
sql_render.op() now correctly forwards the
con argument (@kevinykuo, #73).
New translations for
dbplyr now supplies appropriate translations for the RMariaDB and RPostgres packages (#3154). We generally recommend using these packages in favour of the older RMySQL and RPostgreSQL packages as they are fully DBI compliant and tested with DBItest.
copy_to() can now "copy" tbl_sql in the same src, providing another
way to cache a query into a temporary table (#3064). You can also
copy_to tbl_sqls from another source, and
copy_to() will automatically
collect then copy.
Initial support for stringr functions:
Regular expression support varies from database to database, but most
simple regular expressions should be ok.
db_compute() gains an
analyze argument to match
remote_query_plan() provide a standard API for get metadata about a
remote tbl (#3130, #2923, #2824).
sql_expr() is a more convenient building block for low-level SQL
win_aggregate() for generating SQL and windowed
SQL functions for aggregates. These take one argument,
x, and warn if
na.rm is not
win_recycled() is equivalent to
win_aggregate() and has been soft-deprecated.
db_write_table now needs to return the table name
head() calls in a row now collapse to a single call. This avoids
a printing problem with MS SQL (#3084).
escape() now works with integer64 values from the bit64 package (#3230)
if_else() now correctly scope the false condition
so that it only applies to non-NULL conditions (#3157)
ident_q() handle 0-length inputs better, and should
be easier to use with S3 (#3212)
in_schema() should now work in more places, particularly in
SQL generation for joins no longer gets stuck in a endless loop if you request an empty suffix (#3220).
mutate() has better logic for splitting a single mutate into multiple
paste0() support in MySQL, PostgreSQL (#3168),
and RSQLite (#3176). MySQL and PostgreSQL gain support for
which behaves like
paste(x, collapse = "-") (but for technical reasons
can't be implemented as a straightforward translation of
same_src.tbl_sql() now performs correct comparison instead of always
TRUE. This means that
copy = TRUE once again allows you to
perform cross-database joins (#3002).
select() queries no longer alias column names unnecessarily
rename() are now powered by tidyselect,
fixing a few renaming bugs (#3132, #2943, #2860).
summarise() once again performs partial evaluation before database
test_src() makes it easier to access a single test source.
Better support for temporary tables (@Hong-Revo)
Different translations for filter/mutate contexts for:
is.null()), logical operators (
and comparison operators (
db_write_table()) correctly translates logical
variables to integers (#3151).
n() translation in windowed context.
na_if translation (@cwarden)
PostgreSQL: translation for
grepl() added (@zozlak)
Oracle: changed VARVHAR to VARCHAR2 datatype (@washcycle, #66)
full_join() over non-overlapping columns
by = character() translated to
CROSS JOIN (#2924).
case_when() now translates to SQL "CASE WHEN" (#2894)
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
window_frame() give you finer control over
the window functions that dplyr creates (#2874, #2593).
Added SQL translations for Oracle (@edgararuiz).
x %in% c(1) now generates the same SQL as
x %in% 1 (#2898).
head(tbl, 0) is now supported (#2863).
select()ing zero columns gives a more information error message (#2863).
Variables created in a join are now disambiguated against other variables in the same table, not just variables in the other table (#2823).
PostgreSQL gains a better translation for
db_analyze_table() for MS SQL, Oracle, Hive and Impala (@edgararuiz)
Added support for
sd() for aggregate and window functions (#2887) (@edgararuiz)
You can now use the magrittr pipe within expressions,
mutate(mtcars, cyl %>% as.character()).
If a translation was supplied for a summarise function, but not for the
equivalent windowed variant, the expression would be translated to
with a warning. Now
sql_variant() checks that all aggregate functions
have matching window functions so that correct translations or clean errors
will be generated (#2887)
copy_to() now work directly with DBI connections (#2423, #2576),
so there is no longer a need to generate a dplyr src.
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars") mtcars2 ```
glimpse() now works with remote tables (#2665)
dplyr has gained a basic SQL optimiser, which collapses certain nested SELECT queries into a single query (#1979). This will improve query execution performance for databases with less sophisticated query optimisers, and fixes certain problems with ordering and limits in subqueries (#1979). A big thanks goes to @hhoeflin for figuring out this optimisation.
collapse() now preserve the "ordering" of rows.
This only affects the computation of window functions, as the rest
of SQL does not care about row order (#2281).
copy_to() gains an
overwrite argument which allows you to overwrite
an existing table. Use with care! (#2296)
in_schema() function makes it easy to refer to tables in schema:
query()is no longer exported. It hasn't been useful for a while so this shouldn't break any code.
Partial evaluation occurs immediately when you execute a verb (like
mutate()) rather than happening when the query is executed
mutate.tbl_sql() will now generate as many subqueries as necessary so
that you can refer to variables that you just created (like in mutate
with regular dataframes) (#2481, #2483).
SQL joins have been improved:
SQL joins always use the
ON ... syntax, avoiding
USING ... even for
natural joins. Improved handling of tables with columns of the same name
(#1997, @javierluraschi). They now generate SQL more similar to what you'd
write by hand, eliminating a layer or two of subqueries (#2333)
[API] They now follow the same rules for including duplicated key variables
that the data frame methods do, namely that key variables are only
x, and never from
sql_join() generic now gains a
vars argument which lists
the variables taken from the left and right sides of the join. If you
have a custom
sql_join() method, you'll need to update how your
code generates joins, following the template in
full_join() throws a clear error when you attempt to use it with a
MySQL backend (#2045)
full_join() now return results consistent with
local data frame sources when there are records in the right table with
no match in the left table.
right_join() returns values of
from the right table.
full_join() returns coalesced values of
columns from the left and right tables (#2578, @ianmcook)
group_by() can now perform an inline mutate for database backends (#2422).
The SQL generation set operations (
union_all()) have been considerably improved.
By default, the component SELECT are surrounded with parentheses, except on SQLite. The SQLite backend will now throw an error if you attempt a set operation on a query that contains a LIMIT, as that is not supported in SQLite (#2270).
All set operations match column names across inputs, filling in non-matching variables with NULL (#2556).
group_by() now combine correctly (#1962)
lazy_tbl() have been exported. These help you test
generated SQL with out an active database connection.
ungroup() correctly resets grouping variables (#2704).
as.sql() safely coerces an input to SQL.
More translators for
ident_q() makes it possible to specifier identifiers that do not
need to be quoted.
Translation of inline scalars:
Logical values are now translated differently depending on the backend. The default is to use "true" and "false" which is the SQL-99 standard, but not widely support. SQLite translates to "0" and "1" (#2052).
-Inf are correctly escaped
Better test for whether or not a double is similar to an integer and hence needs a trailing 0.0 added (#2004).
Quoting defaults to
::: are handled correctly (#2321)
x %in% 1 is now correctly translated to
x IN (1) (#511).
if_else() use correct argument names in SQL translation
ident() now returns an object with class
c("ident", "character"). It
no longer contains "sql" to indicate that this is not already escaped.
is.null() gain extra parens in SQL translation to preserve
correct precedence (#2302).
log(x, b) is now correctly translated to the SQL
log(b, x) (#2288).
SQLite does not support the 2-argument log function so it is translated
log(x) / log(b).
nth(x, i) is now correctly translated to
n_distinct() now accepts multiple variables (#2148).
substr() is now translated to SQL, correcting for the difference
in the third argument. In R, it's the position of the last character,
in SQL it's the length of the string (#2536).
win_over() escapes expression using current database rules.
copy_to() now uses
db_write_table() instead of
db_collect() allow backends to
override the entire database process behind
db_sql_render() allow additional control over the SQL
All generics whose behaviour can vary from database to database now provide a DBIConnection method. That means that you can easily scan the NAMESPACE to see the extension points.
sql_escape_logical() allows you to control the translation of
literal logicals (#2614).
src_desc() has been replaced by
db_desc() and now dispatches on the
connection, eliminating the last method that required dispatch on the class
of the src.
win_current_order() are now exported. This
should make it easier to provide customised SQL for window functions
SQL translation for Microsoft SQL Server (@edgararuiz)
SQL translation for Apache Hive (@edgararuiz)
SQL translation for Apache Impala (@edgararuiz)
collect() once again defaults to return all rows in the data (#1968).
This makes it behave the same as
collect() only regroups by variables present in the data (#2156)
collect() will automatically LIMIT the result to the
n, the number of
rows requested. This will provide the query planner with more information
that it may be able to use to improve execution time (#2083).
common_by() gets a better error message for unexpected inputs (#2091)
copy_to() no longer checks that the table doesn't exist before creation,
instead preferring to fall back on the database for error messages. This
should reduce both false positives and false negative (#1470)
copy_to() now succeeds for MySQL if a character column contains
(#1975, #2256, #2263, #2381, @demorenoc, @eduardgrebe).
copy_to() now returns it's output invisibly (since you're often just
calling for the side-effect).
distinct() reports improved variable information for SQL backends. This
means that it is more likely to work in the middle of a pipeline (#2359).
do() on database backends now collects all data locally first
dbFetch() instead of the deprecated
DBI::dbExecute() for non-query SQL commands (#1912)
show_query() now invisibly return the first argument,
making them easier to use inside a pipeline.
print.tbl_sql() displays ordering (#2287) and prints table name, if known.
print(df, n = Inf) and
head(df, n = Inf) now work with remote tables
sql_translate_env() get defaults for DBIConnection.
Formatting now works by overriding the
tbl_sum() generic instead of
print(). This means that the output is more consistent with tibble, and that
format() is now supported also for SQL sources (tidyverse/dbplyr#14).
[API] The signature of
op_base has changed to
op_base(x, vars, class)
partial_eval() have been refined:
translate_sql() no longer takes a vars argument; instead call
Because it no longer needs the environment
works with a list of dots, rather than a
partial_eval() now takes a character vector of variable names
rather than a tbl.
This leads to a simplification of the
op data structure:
dots is now a list of expressions rather than a
op_vars() now returns a list of quoted expressions. This
enables escaping to happen at the correct time (i.e. when the connection
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.