language: Selection language

languageR Documentation

Selection language

Description

Overview of selection features:

tidyselect implements a DSL for selecting variables. It provides helpers for selecting variables:

  • var1:var10: variables lying between var1 on the left and var10 on the right.

  • starts_with("a"): names that start with "a".

  • ends_with("z"): names that end with "z".

  • contains("b"): names that contain "b".

  • matches("x.y"): names that match regular expression x.y.

  • num_range(x, 1:4): names following the pattern, x1, x2, ..., x4.

  • all_of(vars)/any_of(vars): matches names stored in the character vector vars. all_of(vars) will error if the variables aren't present; any_of(var) will match just the variables that exist.

  • everything(): all variables.

  • last_col(): furthest column on the right.

  • where(is.numeric): all variables where is.numeric() returns TRUE.

As well as operators for combining those selections:

  • !selection: only variables that don't match selection.

  • selection1 & selection2: only variables included in both selection1 and selection2.

  • selection1 | selection2: all variables that match either selection1 or selection2.

When writing code inside packages you can substitute "var" for var to avoid R CMD check notes.

Simple examples

Here we show the usage for the basic selection operators. See the specific help pages to learn about helpers like starts_with().

The selection language can be used in functions like dplyr::select() or tidyr::pivot_longer(). Let's first attach the tidyverse:

library(tidyverse)

# For better printing
iris <- as_tibble(iris)

Select variables by name:

starwars %>% select(height)
#> # A tibble: 87 x 1
#>   height
#>    <int>
#> 1    172
#> 2    167
#> 3     96
#> 4    202
#> # ... with 83 more rows

iris %>% pivot_longer(Sepal.Length)
#> # A tibble: 150 x 6
#>   Sepal.Width Petal.Length Petal.Width Species name         value
#>         <dbl>        <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5          1.4         0.2 setosa  Sepal.Length   5.1
#> 2         3            1.4         0.2 setosa  Sepal.Length   4.9
#> 3         3.2          1.3         0.2 setosa  Sepal.Length   4.7
#> 4         3.1          1.5         0.2 setosa  Sepal.Length   4.6
#> # ... with 146 more rows

Select multiple variables by separating them with commas. Note how the order of columns is determined by the order of inputs:

starwars %>% select(homeworld, height, mass)
#> # A tibble: 87 x 3
#>   homeworld height  mass
#>   <chr>      <int> <dbl>
#> 1 Tatooine     172    77
#> 2 Tatooine     167    75
#> 3 Naboo         96    32
#> 4 Tatooine     202   136
#> # ... with 83 more rows

Functions like tidyr::pivot_longer() don't take variables with dots. In this case use c() to select multiple variables:

iris %>% pivot_longer(c(Sepal.Length, Petal.Length))
#> # A tibble: 300 x 5
#>   Sepal.Width Petal.Width Species name         value
#>         <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5         0.2 setosa  Sepal.Length   5.1
#> 2         3.5         0.2 setosa  Petal.Length   1.4
#> 3         3           0.2 setosa  Sepal.Length   4.9
#> 4         3           0.2 setosa  Petal.Length   1.4
#> # ... with 296 more rows

Operators:

The : operator selects a range of consecutive variables:

starwars %>% select(name:mass)
#> # A tibble: 87 x 3
#>   name           height  mass
#>   <chr>           <int> <dbl>
#> 1 Luke Skywalker    172    77
#> 2 C-3PO             167    75
#> 3 R2-D2              96    32
#> 4 Darth Vader       202   136
#> # ... with 83 more rows

The ! operator negates a selection:

starwars %>% select(!(name:mass))
#> # A tibble: 87 x 11
#>   hair_color skin_c~1 eye_c~2 birth~3 sex   gender homew~4 species films vehic~5
#>   <chr>      <chr>    <chr>     <dbl> <chr> <chr>  <chr>   <chr>   <lis> <list> 
#> 1 blond      fair     blue       19   male  mascu~ Tatooi~ Human   <chr> <chr>  
#> 2 <NA>       gold     yellow    112   none  mascu~ Tatooi~ Droid   <chr> <chr>  
#> 3 <NA>       white, ~ red        33   none  mascu~ Naboo   Droid   <chr> <chr>  
#> 4 none       white    yellow     41.9 male  mascu~ Tatooi~ Human   <chr> <chr>  
#> # ... with 83 more rows, 1 more variable: starships <list>, and abbreviated
#> #   variable names 1: skin_color, 2: eye_color, 3: birth_year, 4: homeworld,
#> #   5: vehicles

iris %>% select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#>   Sepal.Width Petal.Width Species
#>         <dbl>       <dbl> <fct>  
#> 1         3.5         0.2 setosa 
#> 2         3           0.2 setosa 
#> 3         3.2         0.2 setosa 
#> 4         3.1         0.2 setosa 
#> # ... with 146 more rows

iris %>% select(!ends_with("Width"))
#> # A tibble: 150 x 3
#>   Sepal.Length Petal.Length Species
#>          <dbl>        <dbl> <fct>  
#> 1          5.1          1.4 setosa 
#> 2          4.9          1.4 setosa 
#> 3          4.7          1.3 setosa 
#> 4          4.6          1.5 setosa 
#> # ... with 146 more rows

& and | take the intersection or the union of two selections:

iris %>% select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Width
#>         <dbl>
#> 1         0.2
#> 2         0.2
#> 3         0.2
#> 4         0.2
#> # ... with 146 more rows

iris %>% select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#>   Petal.Length Petal.Width Sepal.Width
#>          <dbl>       <dbl>       <dbl>
#> 1          1.4         0.2         3.5
#> 2          1.4         0.2         3  
#> 3          1.3         0.2         3.2
#> 4          1.5         0.2         3.1
#> # ... with 146 more rows

To take the difference between two selections, combine the & and ! operators:

iris %>% select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Length
#>          <dbl>
#> 1          1.4
#> 2          1.4
#> 3          1.3
#> 4          1.5
#> # ... with 146 more rows

Details

The order of selected columns is determined by the inputs.

  • all_of(c("foo", "bar")) selects "foo" first.

  • c(starts_with("c"), starts_with("d")) selects all columns starting with "c" first, then all columns starting with "d".


tidyselect documentation built on Oct. 11, 2022, 1:07 a.m.