NEWS.md

tidyr (development version)

tidyr 1.3.1

tidyr 1.3.0

New features

Breaking changes

pivot_wider() provides temporary backwards compatible support for the case of a single unnamed argument that previously was being positionally matched to id_cols. This one special case still works, but will throw a warning encouraging you to explicitly name the id_cols argument.

To read more about this pattern, see https://design.tidyverse.org/dots-after-required.html (#1350).

Lifecycle changes

Rectangling

Bug fixes and minor improvements

General

Nesting, packing, and chopping

Pivoting

Missing values

tidyr 1.2.1

tidyr 1.2.0

Breaking changes

Pivoting

Nesting

Rectangling

Grids

Missing values

Bug fixes and minor improvements

General

Pivoting

Nesting

Rectangling

Grids

Missing values

tidyr 1.1.4

tidyr 1.1.3

tidyr 1.1.2

tidyr 1.1.1

tidyr 1.1.0

General features

Pivoting improvements

Rectangling

Nesting

Bug fixes and minor improvements

tidyr 1.0.2

tidyr 1.0.1

tidyr 1.0.0

Breaking changes

See vignette("in-packages") for a detailed transition guide.

r library(tidyr) nest <- nest_legacy unnest <- unnest_legacy

Pivoting

New pivot_longer() and pivot_wider() provide modern alternatives to spread() and gather(). They have been carefully redesigned to be easier to learn and remember, and include many new features. Learn more in vignette("pivot").

These functions resolve multiple existing issues with spread()/gather(). Both functions now handle mulitple value columns (#149/#150), support more vector types (#333), use tidyverse conventions for duplicated column names (#496, #478), and are symmetric (#453). pivot_longer() gracefully handles duplicated column names (#472), and can directly split column names into multiple variables. pivot_wider() can now aggregate (#474), select keys (#572), and has control over generated column names (#208).

To demonstrate how these functions work in practice, tidyr has gained several new datasets: relig_income, construction, billboard, us_rent_income, fish_encounters and world_bank_pop.

Finally, tidyr demos have been removed. They are dated, and have been superseded by vignette("pivot").

Rectangling

tidyr contains four new functions to support rectangling, turning a deeply nested list into a tidy tibble: unnest_longer(), unnest_wider(), unnest_auto(), and hoist(). They are documented in a new vignette: vignette("rectangle").

unnest_longer() and unnest_wider() make it easier to unnest list-columns of vectors into either rows or columns (#418). unnest_auto() automatically picks between _longer() and _wider() using heuristics based on the presence of common names.

New hoist() provides a convenient way of plucking components of a list-column out into their own top-level columns (#341). This is particularly useful when you are working with deeply nested JSON, because it provides a convenient shortcut for the mutate() + map() pattern:

df %>% hoist(metadata, name = "name")
# shortcut for
df %>% mutate(name = map_chr(metadata, "name"))

Nesting

nest() and unnest() have been updated with new interfaces that are more closely aligned to evolving tidyverse conventions. They use the theory developed in vctrs to more consistently handle mixtures of input types, and their arguments have been overhauled based on the last few years of experience. They are supported by a new vignette("nest"), which outlines some of the main ideas of nested data (it's still very rough, but will get better over time).

The biggest change is to their operation with multiple columns: df %>% unnest(x, y, z) becomes df %>% unnest(c(x, y, z)) and df %>% nest(x, y, z) becomes df %>% nest(data = c(x, y, z)).

I have done my best to ensure that common uses of nest() and unnest() will continue to work, generating an informative warning telling you precisely how you need to update your code. Please file an issue if I've missed an important use case.

unnest() has been overhauled:

Packing and chopping

Under the hood, nest() and unnest() are implemented with chop(), pack(), unchop(), and unpack():

Packing and chopping are interesting primarily because they are the atomic operations underlying nesting (and similarly, unchop and unpacking underlie unnesting), and I don't expect them to be used directly very often.

New features

Bug fixes and minor improvements

tidyr 0.8.3

tidyr 0.8.2

tidyr 0.8.1

tidyr 0.8.0

Breaking changes

New features

Bug fixes and minor improvements

tidyr 0.7.2

tidyr 0.7.1

This is a hotfix release to account for some tidyselect changes in the unit tests.

Note that the upcoming version of tidyselect backtracks on some of the changes announced for 0.7.0. The special evaluation semantics for selection have been changed back to the old behaviour because the new rules were causing too much trouble and confusion. From now on data expressions (symbols and calls to : and c()) can refer to both registered variables and to objects from the context.

However the semantics for context expressions (any calls other than to : and c()) remain the same. Those expressions are evaluated in the context only and cannot refer to registered variables. If you're writing functions and refer to contextual objects, it is still a good idea to avoid data expressions by following the advice of the 0.7.0 release notes.

tidyr 0.7.0

This release includes important changes to tidyr internals. Tidyr now supports the new tidy evaluation framework for quoting (NSE) functions. It also uses the new tidyselect package as selecting backend.

Breaking changes

x <- 3 df <- tibble(w = 1, x = 2, y = 3) gather(df, "variable", "value", 1:x)

Does it select the first three columns (using the x defined in the global environment), or does it select the first two columns (using the column named x)?

To solve this ambiguity, we now make a strict distinction between data and context expressions. A data expression is either a bare name or an expression like x:y or c(x, y). In a data expression, you can only refer to columns from the data frame. Everything else is a context expression in which you can only refer to objects that you have defined with <-.

In practice this means that you can no longer refer to contextual objects like this:

``` mtcars %>% gather(var, value, 1:ncol(mtcars))

x <- 3 mtcars %>% gather(var, value, 1:x) mtcars %>% gather(var, value, -(1:x)) ```

You now have to be explicit about where to find objects. To do so, you can use the quasiquotation operator !! which will evaluate its argument early and inline the result:

{r} mtcars %>% gather(var, value, !! 1:ncol(mtcars)) mtcars %>% gather(var, value, !! 1:x) mtcars %>% gather(var, value, !! -(1:x))

An alternative is to turn your data expression into a context expression by using seq() or seq_len() instead of :. See the section on tidyselect for more information about these semantics.

`-0.949999999999999`, `-0.940000000000001`, ... must resolve to integer column positions, not a double vector

please round the positions before supplying them to tidyr. Double vectors are fine as long as they are rounded.

Switch to tidy evaluation

tidyr is now a tidy evaluation grammar. See the programming vignette in dplyr for practical information about tidy evaluation.

The tidyr port is a bit special. While the philosophy of tidy evaluation is that R code should refer to real objects (from the data frame or from the context), we had to make some exceptions to this rule for tidyr. The reason is that several functions accept bare symbols to specify the names of new columns to create (gather() being a prime example). This is not tidy because the symbol do not represent any actual object. Our workaround is to capture these arguments using rlang::quo_name() (so they still support quasiquotation and you can unquote symbols or strings). This type of NSE is now discouraged in the tidyverse: symbols in R code should represent real objects.

Following the switch to tidy eval the underscored variants are softly deprecated. However they will remain around for some time and without warning for backward compatibility.

Switch to the tidyselect backend

The selecting backend of dplyr has been extracted in a standalone package tidyselect which tidyr now uses for selecting variables. It is used for selecting multiple variables (in drop_na()) as well as single variables (the col argument of extract() and separate(), and the key and value arguments of spread()). This implies the following changes:

You can still refer to contextual objects in a data expression by being explicit. One way of being explicit is to unquote a variable from the environment with the tidy eval operator !!:

r x <- 2 drop_na(df, 2) # Works fine drop_na(df, x) # Object 'x' not found drop_na(df, !! x) # Works as if you had supplied 2

On the other hand, select helpers like start_with() are context expressions. It is therefore easy to refer to objects and they will never be ambiguous with data columns:

{r} x <- "d" drop_na(df, starts_with(x))

While these special rules is in contrast to most dplyr and tidyr verbs (where both the data and the context are in scope) they make sense for selecting functions and should provide more robust and helpful semantics.

tidyr 0.6.3

tidyr 0.6.2

tidyr 0.6.1

tidyr 0.6.0

API changes

Bug fixes and minor improvements

tidyr 0.5.1

tidyr 0.5.0

New functions

Bug fixes and minor improvements

tidyr 0.4.1

tidyr 0.4.0

Nested data frames

nest() and unnest() have been overhauled to support a useful way of structuring data frames: the nested data frame. In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.

Expanding

Minor bug fixes and improvements

tidyr 0.3.1

tidyr 0.3.0

New features

Bug fixes and minor improvements

tidyr 0.2.0

New functions

Bug fixes and minor improvements



tidyverse/tidyr documentation built on Oct. 30, 2024, 1:53 a.m.