crossover: Apply functions to a set of columns and a vector...

Description Usage Arguments Value Examples See Also

View source: R/crossover.R

Description

crossover() combines the functionality of dplyr::across() with over() by iterating simultaneously over (i) a set of columns (.xcols) and (ii) a vector or list (.y). crossover() always applies the functions in .fns in a nested way to a combination of both inputs. There are, however, two different ways in which the functions in .fns are applied.

When .y is a vector or list, each function in .fns is applied to all pairwise combinations between columns in .xcols and elements in .y (this resembles the behavior of over2x() and across2x()).

crossover() has one trick up it's sleeves, which sets it apart from the other functions in the <over-across family>: Its second input (.y) can be a function. This changes the originial behavior slightly: First the function in .y is applied to all columns in .xcols to generate an input object which will be used as .y in the function calls in .fns. In this case each function is applied to all pairs between (i) columns in .xcols with (ii) the output elements that they generated through the function that was originally supplied to .y. Note that the underyling data must not be grouped, if a function is supplied to .y. For examples see the example section below.

Usage

1
2
3
4
5
6
7
8
crossover(
  .xcols = dplyr::everything(),
  .y,
  .fns,
  ...,
  .names = NULL,
  .names_fn = NULL
)

Arguments

.xcols

<tidy-select> Columns to transform. Because crossover() is used within functions like summarise() and mutate(), you can't select or compute upon grouping variables.

.y

An atomic vector or list to apply functions to. crossover() also accepts a function as .y argument. In this case each column in .xcols is looped over all the outputs that it generated with the function supplied to .y. Note: the underyling data must not be grouped, if a function is supplied to .y.

If a function is supplied, the following values are possible:

  • A bare function name, e.g. unique

  • An anonymous function, e.g. function(x) unique(x)

  • A purrr-style lambda, e.g. ~ unique(.x, fromLast = TRUE)

Note that additional arguments can only be specified with an anonymous function, a purrr-style lamba or with a pre-filled custom function.

.fns

Functions to apply to each column in .xcols and element in .y.

Possible values are:

  • A function

  • A purrr-style lambda

  • A list of functions/lambdas

Note that NULL is not accepted as argument to .fns.

...

Additional arguments for the function calls in .fns.

.names

A glue specification that describes how to name the output columns. This can use:

  • {xcol} to stand for the selected column name,

  • {y} to stand for the selected vector element, and

  • {fn} to stand for the name of the function being applied.

The default (NULL) is equivalent to "{xcol}_{y}" for the single function case and "{xcol}_{y}_{fn}" for the case where a list is used for .fns.

Note that, depending on the nature of the underlying object in .y, specifying {y} will yield different results:

  • If .y is an unnamed atomic vector, {y} will represent each value.

  • If .y is a named list or atomic vector, {y} will represent each name.

  • If .y is an unnamed list, {y} will be the index number running from 1 to length(y).

This standard behavior (interpretation of {y}) can be overwritten by directly specifying:

  • {y_val} for .y's values

  • {y_nm} for its names

  • {y_idx} for its index numbers

Alternatively, a character vector of length equal to the number of columns to be created can be supplied to .names. Note that in this case, the glue specification described above is not supported.

.names_fn

Optionally, a function that is applied after the glue specification in .names has been evaluated. This is, for example, helpful, in case the resulting names need to be further cleaned or trimmed.

Value

crossover() returns a tibble with one column for each combination of columns in .xcols, elements in .y and functions in .fns.

If a function is supplied as .y argument, crossover() returns a tibble with one column for each pair of output elements of .y and the column in .xcols that generated the output combined with each function in .fns.

Examples

For the basic functionality please refer to the examples in over() and dplyr::across().

library(dplyr)

# For better printing
iris <- as_tibble(iris)

Creating many similar variables for mulitple columns

If .y is a vector or list, crossover() loops every combination between columns in .xcols and elements in .y over the functions in .fns. This is helpful in cases where we want to create a batch of similar variables with only slightly changes in the arguments of the calling function. A good example are lagged variables. Below we create five lagged variables for each 'Sepal.Length' and 'Sepal.Width'. To create nice names we use a named list as argument in .fns and specify the glue syntax in .names.

 iris %>%
   transmute(
     crossover(starts_with("sepal"),
               1:5,
               list(lag = ~ lag(.x, .y)),
               .names = "{xcol}_{fn}{y}")) %>%
   glimpse
#> Rows: 150
#> Columns: 10
#> $ Sepal.Length_lag1 <dbl> NA, 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4~
#> $ Sepal.Length_lag2 <dbl> NA, NA, 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9,~
#> $ Sepal.Length_lag3 <dbl> NA, NA, NA, 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, ~
#> $ Sepal.Length_lag4 <dbl> NA, NA, NA, NA, 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4~
#> $ Sepal.Length_lag5 <dbl> NA, NA, NA, NA, NA, 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.~
#> $ Sepal.Width_lag1  <dbl> NA, 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7~
#> $ Sepal.Width_lag2  <dbl> NA, NA, 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1,~
#> $ Sepal.Width_lag3  <dbl> NA, NA, NA, 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, ~
#> $ Sepal.Width_lag4  <dbl> NA, NA, NA, NA, 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2~
#> $ Sepal.Width_lag5  <dbl> NA, NA, NA, NA, NA, 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.~

Creating dummy variables for multiple varialbes (columns)

The .y argument of crossover() can take a function instead of list or vector. In the example below we select the columns 'type', 'product', 'csat' in .xcols. We supply the function dist_values() to .y, which is a cleaner variant of base R's unique(). This generates all distinct values for all three selected variables. Now, the function in .fns, ~ if_else(.y == .x, 1, 0), is applied to each pair of distinct value in .y and the column in .xcols that generated this value. This basically creates a dummy variable for each value of each variable. Since some of the values contain whitespace characters, we can use the .names_fn argument to supply a third function that cleans the output names by replacing spaces with an underscore and setting all characters tolower().

 csat %>%
   transmute(
     crossover(.xcols = c(type, product, csat),
               .y = dist_values,
               .fns = ~ if_else(.y == .x, 1, 0),
               .names_fn = ~ gsub("\\s", "_", .x) %>% tolower(.)
               )) %>%
   glimpse
#> Rows: 150
#> Columns: 11
#> $ type_new              <dbl> 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,~
#> $ type_existing         <dbl> 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,~
#> $ type_reactivate       <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0,~
#> $ product_basic         <dbl> 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,~
#> $ product_advanced      <dbl> 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,~
#> $ product_premium       <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,~
#> $ csat_very_unsatisfied <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,~
#> $ csat_unsatisfied      <dbl> 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,~
#> $ csat_neutral          <dbl> 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,~
#> $ csat_satisfied        <dbl> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,~
#> $ csat_very_satisfied   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,~

See Also

Other members of the <over-across function family>.


TimTeaFan/dplyover documentation built on Sept. 27, 2021, 3:14 p.m.