or columns in 'dplyr'

Description Usage Arguments Value Examples

These functions are selection helpers. They are intended to be used inside all functions that accept a vector as argument (that is over() and crossover() and all their variants) to extract values of a variable.

dist_values() returns all distinct values (or in the case of factor variables: levels) of a variable x which are not NA.
seq_range() returns the sequence between the range() of a variable x.

1
2
3

dist_values(x, .sep = NULL, .sort = c("asc", "desc", "none", "levels"))

seq_range(x, .by)

`x`	An atomic vector or list. For `seq_range()` x must be numeric or date.
`.sep`	A character vector containing regular expression(s) which are used for splitting the values (works only if x is a character vector).
`.sort`	A character string indicating which sorting scheme is to be applied to distinct values: ascending ("asc"), descending ("desc"), "none" or "levels". The default is ascending, only if x is a factor the default is "levels".
`.by`	A number (or date expression) representing the increment of the sequence.

dist_values() returns a vector of the same type of x, with exception of factors which are converted to type "character".

seq_range() returns an vector of type "integer" or "double".

Selection helpers can be used inside dplyover::over() which in turn must be used inside dplyr::mutate or dplyr::summarise. Let's first attach dplyr:

library(dplyr)

# For better printing
iris <- as_tibble(iris)

dist_values() extracts all distinct values of a column variable. This is helpful when creating dummy variables in a loop using over().

iris %>%
  mutate(over(dist_values(Species),
              ~ if_else(Species == .x, 1, 0)
              ),
         .keep = "none")
#> # A tibble: 150 x 3
#>   setosa versicolor virginica
#>    <dbl>      <dbl>     <dbl>
#> 1      1          0         0
#> 2      1          0         0
#> 3      1          0         0
#> 4      1          0         0
#> # ... with 146 more rows

dist_values() is just a wrapper around unique. However, it has five differences:

(1) NA values are automatically stripped. Compare:

unique(c(1:3, NA))
#> [1]  1  2  3 NA
dist_values(c(1:3, NA))
#> [1] 1 2 3

(2) Applied on factors, dist_values() returns all distinct levels as character. Compare the following:

fctrs <- factor(c(1:3, NA), levels = c(3:1))

fctrs %>% unique() %>% class()
#> [1] "factor"

fctrs %>% dist_values() %>% class()
#> [1] "character"

(3) As default, the output is sorted in ascending order for non-factors, and is sorted as the underyling "levels" for factors. This can be controlled by setting the .sort argument. Compare:

# non-factors
unique(c(3,1,2))
#> [1] 3 1 2

dist_values(c(3,1,2))
#> [1] 1 2 3
dist_values(c(3,1,2), .sort = "desc")
#> [1] 3 2 1
dist_values(c(3,1,2), .sort = "none")
#> [1] 3 1 2

# factors
fctrs <- factor(c(2,1,3, NA), levels = c(3:1))

dist_values(fctrs)
#> [1] "3" "2" "1"
dist_values(fctrs, .sort = "levels")
#> [1] "3" "2" "1"
dist_values(fctrs, .sort = "asc")
#> [1] "1" "2" "3"
dist_values(fctrs, .sort = "desc")
#> [1] "3" "2" "1"
dist_values(fctrs, .sort = "none")
#> [1] "2" "1" "3"

(4) When used on a character vector dist_values can take a separator .sep to split the elements accordingly:

c("1, 2, 3",
  "2, 4, 5",
  "4, 1, 7") %>%
  dist_values(., .sep = ", ")
#> [1] "1" "2" "3" "4" "5" "7"

(5) When used on lists dist_values automatically simplifiies its input into a vector using unlist:

list(a = c(1:4), b = (4:6), c(5:10)) %>%
  dist_values()
#>  [1]  1  2  3  4  5  6  7  8  9 10

seq_range() generates a numeric sequence between the min and max values of its input variable. This is helpful when creating many dummy variables with varying thresholds.

iris %>%
  mutate(over(seq_range(Sepal.Length, 1),
              ~ if_else(Sepal.Length > .x, 1, 0),
              .names = "Sepal.Length.{x}"),
         .keep = "none")
#> # A tibble: 150 x 3
#>   Sepal.Length.5 Sepal.Length.6 Sepal.Length.7
#>            <dbl>          <dbl>          <dbl>
#> 1              1              0              0
#> 2              0              0              0
#> 3              0              0              0
#> 4              0              0              0
#> # ... with 146 more rows

Note that if the input variable does not have decimal places, min and max are wrapped in ceiling and floor accordingly. This will prevent the creation of variables that contain only 0 or 1. Compare the output below with the example above:

iris %>%
  mutate(over(seq(round(min(Sepal.Length), 0),
                  round(max(Sepal.Length), 0),
                  1),
              ~ if_else(Sepal.Length > .x, 1, 0),
              .names = "Sepal.Length.{x}"),
         .keep = "none")
#> # A tibble: 150 x 5
#>   Sepal.Length.4 Sepal.Length.5 Sepal.Length.6 Sepal.Length.7 Sepal.Length.8
#>            <dbl>          <dbl>          <dbl>          <dbl>          <dbl>
#> 1              1              1              0              0              0
#> 2              1              0              0              0              0
#> 3              1              0              0              0              0
#> 4              1              0              0              0              0
#> # ... with 146 more rows

seq_range() also works on dates:

some_dates <- c(as.Date("2020-01-02"),
                as.Date("2020-05-02"),
                as.Date("2020-03-02"))


some_dates %>%
  seq_range(., "1 month")
#> [1] "2020-01-02" "2020-02-02" "2020-03-02" "2020-04-02" "2020-05-02"

TimTeaFan/dplyover documentation built on Sept. 27, 2021, 3:14 p.m.

TimTeaFan/dplyover index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TimTeaFan/dplyover
Create columns by applying functions to vectors and/or columns in 'dplyr'

select_values: Select values from variables
In TimTeaFan/dplyover: Create columns by applying functions to vectors and/or columns in 'dplyr'

Description

Usage

Arguments

Value

Examples

Related to select_values in TimTeaFan/dplyover...

R Package Documentation

Browse R Packages

We want your feedback!

TimTeaFan/dplyover Create columns by applying functions to vectors and/or columns in 'dplyr'

select_values: Select values from variables In TimTeaFan/dplyover: Create columns by applying functions to vectors and/or columns in 'dplyr'

Description

Usage

Arguments

Value

Examples

Related to select_values in TimTeaFan/dplyover...

R Package Documentation

Browse R Packages

We want your feedback!

TimTeaFan/dplyover
Create columns by applying functions to vectors and/or columns in 'dplyr'

select_values: Select values from variables
In TimTeaFan/dplyover: Create columns by applying functions to vectors and/or columns in 'dplyr'