across2: Apply functions to two sets of columns simultaniously in...

Description Usage Arguments Value Examples

View source: R/across2.R

Description

across2() and across2x() are variants of dplyr::across() that iterate over two columns simultaneously. across2() loops each pair of columns in .xcols and .ycols over one or more functions, while across2x() loops every combination between columns in .xcols and .ycols over one or more functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
across2(.xcols, .ycols, .fns, ..., .names = NULL, .names_fn = NULL)

across2x(
  .xcols,
  .ycols,
  .fns,
  ...,
  .names = NULL,
  .names_fn = NULL,
  .comb = "all"
)

Arguments

.xcols, .ycols

<tidy-select> Columns to transform. Note that you can not select or compute upon grouping variables.

.fns

Functions to apply to each column in .xcols and .ycols.

Possible values are:

  • A function

  • A purrr-style lambda

  • A list of functions/lambdas

Note that NULL is not accepted as argument to .fns.

...

Additional arguments for the function calls in .fns.

.names

A glue specification that describes how to name the output columns. This can use:

  • {xcol} to stand for the selected column name in .xcols,

  • {ycol} to stand for the selected column name in .ycols, and

  • {fn} to stand for the name of the function being applied.

The default (NULL) is equivalent to "{xcol}_{ycol}" for the single function case and "{xcol}_{ycol}_{fn}" for the case where a list is used for .fns.

across2() supports two additional glue specifications: {pre} and {suf}. They extract the common alphanumeric prefix or suffix of each pair of variables.

Alternatively to a glue specification, a character vector of length equal to the number of columns to be created can be supplied to .names. Note that in this case, the glue specification described above is not supported.

.names_fn

Optionally, a function that is applied after the glue specification in .names has been evaluated. This is, for example, helpful, in case the resulting names need to be further cleaned or trimmed.

.comb

In across2x() this argument allows to control which combinations of columns are to be created. This argument only matters, if the columns specified in .xcols and .ycols overlap to some extent.

  • "all", the default, will create all pairwise combinations between columns in .xcols and .ycols including all permutations (e.g. foo(column_x, column_y) as well as foo(column_y, column_x).

  • "unique" will only create all unordered combinations (e.g. creates foo(column_x, column_y), while foo(column_y, column_x) will not be created)

  • "minimal same as "unique" and further skips all self-matches (e.g. foo(column_x, column_x) will not be created)

Value

across2() returns a tibble with one column for each pair of elements in .xcols and .ycols combined with each function in .fns.

across2x() returns a tibble with one column for each combination between elements in .x and.y combined with each function in .fns.

Examples

For the basic functionality of across() please refer to the examples in dplyr::across().

library(dplyr)

# For better printing
iris <- as_tibble(iris)

across2() can be used to transfrom pairs of variables in one or more functions. In the example below we want to calculate the product and the sum of all pairs of 'Length' and 'Width' variables. We can use {pre} in the glue specification in .names to extract the common prefix of each pair of variables. We can further transform the names, in the example setting them tolower by specifying the .names_fn argument:

iris %>%
  transmute(across2(ends_with("Length"),
                    ends_with("Width"),
                    .fns = list(product = ~ .x * .y,
                                sum = ~ .x + .y),
                   .names = "{pre}_{fn}",
                   .names_fn = tolower))
#> # A tibble: 150 x 4
#>   sepal_product sepal_sum petal_product petal_sum
#>           <dbl>     <dbl>         <dbl>     <dbl>
#> 1          17.8       8.6          0.28       1.6
#> 2          14.7       7.9          0.28       1.6
#> 3          15.0       7.9          0.26       1.5
#> 4          14.3       7.7          0.3        1.7
#> # ... with 146 more rows

across2x() can be used to perform calculations on each combination of variables. In the example below we calculate the correlation between all variables in the iris data set for each group. To do this, we group_by 'Species' and specify the tidyselect helper everything() to .xcols and .ycols. ~ round(cor(.x, .y), 2) gives us the correlation rounded to two digits for each pair of variables. We trim the rahter long variables names by replacing "Sepal" with "S", and "Petal" with "P" in the .names_fn argument. Finally, we are not interested in correlations of the same column and want to avoid excessive reults by setting the .comb argument to "minimal".

iris %>%
  group_by(Species) %>%
  summarise(across2x(everything(),
                     everything(),
                     ~ round(cor(.x, .y), 2),
                     .names_fn = ~ gsub("Sepal", "S", .x) %>%
                                     gsub("Petal", "P", .),
                     .comb = "minimal"))
#> # A tibble: 3 x 7
#>   Species    S.Length_S.Width S.Length_P.Length S.Length_P.Width S.Width_P.Length
#>   <fct>                 <dbl>             <dbl>            <dbl>            <dbl>
#> 1 setosa                 0.74              0.27             0.28             0.18
#> 2 versicolor             0.53              0.75             0.55             0.56
#> 3 virginica              0.46              0.86             0.28             0.4 
#> # ... with 2 more variables: S.Width_P.Width <dbl>, P.Length_P.Width <dbl>

TimTeaFan/dplyover documentation built on Sept. 27, 2021, 3:14 p.m.