NEWS.md

dplyr (development version)

dplyr 1.1.4

dplyr 1.1.3

dplyr 1.1.2

dplyr 1.1.1

We've accomplished this in two steps:

This change unfortunately does mean that if you have set multiple = "all" to avoid a warning and you happened to be doing a many-to-many style join, then you will need to replace multiple = "all" with relationship = "many-to-many" to silence the new warning, but we believe this should be rare since many-to-many relationships are fairly uncommon.

dplyr 1.1.0

New features

Rather than:

starwars %>% group_by(species, homeworld) %>% summarise(mean_height = mean(height))

You can now write:

starwars %>% summarise( mean_height = mean(height), .by = c(species, homeworld) )

The most useful reason to do this is because .by only affects a single operation. In the example above, an ungrouped data frame went into the summarise() call, so an ungrouped data frame will come out; with .by, you never need to remember to ungroup() afterwards and you never need to use the .groups argument.

Additionally, using summarise() with .by will never sort the results by the group key, unlike with group_by(). Instead, the results are returned using the existing ordering of the groups from the original data. We feel this is more predictable, better maintains any ordering you might have already applied with a previous call to arrange(), and provides a way to maintain the current ordering without having to resort to factors.

This feature was inspired by data.table, where the equivalent syntax looks like:

starwars[, .(mean_height = mean(height)), by = .(species, homeworld)]

with_groups() is superseded in favor of .by (#6582).

reframe() has been added in response to valid concern from the community that allowing summarise() to return any number of rows per group increases the chance for accidental bugs. We still feel that this is a powerful technique, and is a principled replacement for do(), so we have moved these features to reframe() (#6382).

Lifecycle changes

Breaking changes

Newly deprecated

Newly superseded

Newly stable

vctrs

Many of dplyr's vector functions have been rewritten to make use of the vctrs package, bringing greater consistency and improved performance.

Additionally, they have all gained an na_rm argument since they are summary functions (#6242, with contributions from @tnederlof).

if_else() also no longer allows you to supply NULL for either true or false, which was an undocumented usage that we consider to be off-label, because true and false are intended to be (and documented to be) vector inputs (#6730).

You can also now replace NaN values in x with na_if(x, NaN).

Minor improvements and bug fixes

This fixes performance issues when thousands of warnings are emitted with rowwise and grouped data frames (#6005, #6236).

dplyr 1.0.10

Hot patch release to resolve R CMD check failures.

dplyr 1.0.9

dplyr 1.0.8

dplyr 1.0.7

dplyr 1.0.6

dplyr 1.0.5

dplyr 1.0.4

dplyr 1.0.3

dplyr 1.0.2

dplyr 1.0.1

dplyr 1.0.0

Breaking changes

Fix by prefixing with dplyr:: as in dplyr::mutate(mtcars, x = dplyr::n())

Input must be a vector, not a `<data.frame/...>` object

New features

Experimental features

across()

rowwise()

vctrs

Grouping

Lifecycle changes

Removed

Deprecated

Superseded

Questioning

Stable

Documentation improvements

Minor improvements and bug fixes

dplyr 0.8.5 (2020-03-07)

dplyr 0.8.4 (2020-01-30)

dplyr 0.8.3 (2019-07-04)

dplyr 0.8.2 (2019-06-28)

New functions

colwise changes

Hybrid evaluation changes

Minor changes

dplyr 0.8.1 (2019-05-14)

Breaking changes

New functions

Minor changes

dplyr 0.8.0.1 (2019-02-15)

dplyr 0.8.0 (2019-02-14)

Breaking changes

indicates when functions like n(), row_number(), ... are not imported or prefixed.

The easiest fix is to import dplyr with import(dplyr) in your NAMESPACE or #' @import dplyr in a roxygen comment, alternatively such functions can be imported selectively as any other function with importFrom(dplyr, n) in the NAMESPACE or #' @importFrom dplyr n in a roxygen comment. The third option is to prefix them, i.e. use dplyr::n()

New functions

r mtcars %>% group_by(cyl) %>% group_map(~ head(.x, 2L))

Major changes

The default behaviour drops the empty groups as in the previous versions.

r tibble( x = 1:2, f = factor(c("a", "b"), levels = c("a", "b", "c")) ) %>% group_by(f)

Minor changes

r mtcars %>% filter_at(vars(hp, vs), ~ . %% 2 == 0)

Lifecycle

Changes to column wise functions

Performance

Unwind-protection also makes dplyr more robust in corner cases because it ensures the C++ destructors are correctly called in all circumstances (debugger exit, captured condition, restart invocation).

Internal

Documentation

Deprecated and defunct functions

dplyr 0.7.6

dplyr 0.7.5 (2018-04-14)

Breaking changes for package developers

Bug fixes

Major changes

Following the switch to tidyselect, select() and rename() fully support character vectors. You can now unquote variables like this:

vars <- c("disp", "cyl") select(mtcars, !! vars) select(mtcars, -(!! vars))

Note that this only works in selecting functions because in other contexts strings and character vectors are ambiguous. For instance strings are a valid input in mutating operations and mutate(df, "foo") creates a new column by recycling "foo" to the number of rows.

Minor changes

Documentation

Error messages

Performance

Internal

dplyr 0.7.4

dplyr 0.7.3

dplyr 0.7.2

dplyr 0.7.1

dplyr 0.7.0

New data, functions, and features

This verb is powered with the new select_var() internal helper, which is exported as well. It is like select_vars() but returns a single variable.

Deprecated and defunct

Databases

This version of dplyr includes some major changes to how database connections work. By and large, you should be able to continue using your existing dplyr database code without modification, but there are two big changes that you should be aware of:

You can continue to use src_mysql(), src_postgres(), and src_sqlite(), but I recommend a new style that makes the connection to DBI more clear:

library(dplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "mtcars", mtcars)

mtcars2 <- tbl(con, "mtcars")
mtcars2

This is particularly useful if you want to perform non-SELECT queries as you can do whatever you want with DBI::dbGetQuery() and DBI::dbExecute().

If you've implemented a database backend for dplyr, please read the backend news to see what's changed from your perspective (not much). If you want to ensure your package works with both the current and previous version of dplyr, see wrap_dbplyr_obj() for helpers.

UTF-8

Colwise functions

Tidyeval

dplyr has a new approach to non-standard evaluation (NSE) called tidyeval. It is described in detail in vignette("programming") but, in brief, gives you the ability to interpolate values in contexts where dplyr usually works with expressions:

my_var <- quo(homeworld)

starwars %>%
  group_by(!!my_var) %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)

This means that the underscored version of each main verb is no longer needed, and so these functions have been deprecated (but remain around for backward compatibility).

Verbs

Joins

Select

Other

Combining and comparing

Vector functions

Other minor changes and bug fixes

dplyr 0.5.0

Breaking changes

Existing functions

Deprecated and defunct functions

New functions

Local backends

dtplyr

All data table related code has been separated out in to a new dtplyr package. This decouples the development of the data.table interface from the development of the dplyr package. If both data.table and dplyr are loaded, you'll get a message reminding you to load dtplyr.

Tibble

Functions related to the creation and coercion of tbl_dfs, now live in their own package: tibble. See vignette("tibble") for more details.

tbl_cube

Remote backends

SQLite

SQL translation

Internals

This version includes an almost total rewrite of how dplyr verbs are translated into SQL. Previously, I used a rather ad-hoc approach, which tried to guess when a new subquery was needed. Unfortunately this approach was fraught with bugs, so in this version I've implemented a much richer internal data model. Now there is a three step process:

  1. When applied to a tbl_lazy, each dplyr verb captures its inputs and stores in a op (short for operation) object.

  2. sql_build() iterates through the operations building to build up an object that represents a SQL query. These objects are convenient for testing as they are lists, and are backend agnostics.

  3. sql_render() iterates through the queries and generates the SQL, using generics (like sql_select()) that can vary based on the backend.

In the short-term, this increased abstraction is likely to lead to some minor performance decreases, but the chance of dplyr generating correct SQL is much much higher. In the long-term, these abstractions will make it possible to write a query optimiser/compiler in dplyr, which would make it possible to generate much more succinct queries.

If you have written a dplyr backend, you'll need to make some minor changes to your package:

There were two other tweaks to the exported API, but these are less likely to affect anyone.

Minor improvements and bug fixes

Single table verbs

Dual table verbs

Vector functions

dplyr 0.4.3

Improved encoding support

Until now, dplyr's support for non-UTF8 encodings has been rather shaky. This release brings a number of improvement to fix these problems: it's probably not perfect, but should be a lot better than the previously version. This includes fixes to arrange() (#1280), bind_rows() (#1265), distinct() (#1179), and joins (#1315). print.tbl_df() also received a fix for strings with invalid encodings (#851).

Other minor improvements and bug fixes

Databases

Hybrid evaluation

dplyr 0.4.2

This is a minor release containing fixes for a number of crashes and issues identified by R CMD CHECK. There is one new "feature": dplyr no longer complains about unrecognised attributes, and instead just copies them over to the output.

dplyr 0.4.1

dplyr 0.4.0

New features

New vignettes

Minor improvements

Bug fixes

dplyr 0.3.0.1

dplyr 0.3

New functions

Programming with dplyr (non-standard evaluation)

Removed and deprecated features

Minor improvements and bug fixes

Minor improvements and bug fixes by backend

Databases

Data frames/tbl_df

Data tables

Cubes

dplyr 0.2

Piping

dplyr now imports %>% from magrittr (#330). I recommend that you use this instead of %.% because it is easier to type (since you can hold down the shift key) and is more flexible. With you %>%, you can control which argument on the RHS receives the LHS by using the pronoun .. This makes %>% more useful with base R functions because they don't always take the data frame as the first argument. For example you could pipe mtcars to xtabs() with:

mtcars %>% xtabs( ~ cyl + vs, data = .)

Thanks to @smbache for the excellent magrittr package. dplyr only provides %>% from magrittr, but it contains many other useful functions. To use them, load magrittr explicitly: library(magrittr). For more details, see vignette("magrittr").

%.% will be deprecated in a future version of dplyr, but it won't happen for a while. I've also deprecated chain() to encourage a single style of dplyr usage: please use %>% instead.

Do

do() has been completely overhauled. There are now two ways to use it, either with multiple named arguments or a single unnamed arguments. group_by() + do() is equivalent to plyr::dlply, except it always returns a data frame.

If you use named arguments, each argument becomes a list-variable in the output. A list-variable can contain any arbitrary R object so it's particularly well suited for storing models.

library(dplyr)
models <- mtcars %>% group_by(cyl) %>% do(lm = lm(mpg ~ wt, data = .))
models %>% summarise(rsq = summary(lm)$r.squared)

If you use an unnamed argument, the result should be a data frame. This allows you to apply arbitrary functions to each group.

mtcars %>% group_by(cyl) %>% do(head(., 1))

Note the use of the . pronoun to refer to the data in the current group.

do() also has an automatic progress bar. It appears if the computation takes longer than 5 seconds and lets you know (approximately) how much longer the job will take to complete.

New verbs

dplyr 0.2 adds three new verbs:

Minor improvements

Bug fixes

dplyr 0.1.3

Bug fixes

dplyr 0.1.2

New features

Bug fixes

dplyr 0.1.1

Improvements

Bug fixes



hadley/dplyr documentation built on Nov. 6, 2024, 4:48 p.m.