select: Subset columns using their names and types

View source: R/select.R

selectR Documentation

Subset columns using their names and types

Description

Select (and optionally rename) variables in a data.frame, using a concise mini-language that makes it easy to refer to variables based on their name (e.g. a:f selects all columns from a on the left to f on the right). You can also use predicate functions like is.numeric() to select variables based on their properties.

Usage

select(.data, ...)

Arguments

.data

A data.frame.

...

<poor-select> One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables.

Details

Overview of selection features

poorman selections implement a dialect of R where operators make it easy to select variables:

  • : for selecting a range of consecutive variables.

  • ! for taking the complement of a set of variables.

  • & and | for selecting the intersection or the union of two sets of variables.

  • c() for combining selections.

In addition, you can use selection helpers. Some helpers select specific columns:

  • everything(): Matches all variables.

  • last_col(): Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

  • starts_with(): Starts with a prefix.

  • ends_with(): Ends with a suffix.

  • contains(): Contains a literal string.

  • matches(): Matches a regular expression.

  • num_range(): Matches a numerical range like x01, x02, x03.

These helpers select variables from a character vector:

  • all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

  • any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

This helper selects variables with a function:

  • where(): Applies a function to all variables and selects those for which the function returns TRUE.

Value

An object of the same type as .data. The output has the following properties:

  • Rows are not affected.

  • Output columns are a subset of input columns, potentially with a different order. Columns will be renamed if new_name = old_name form is used.

  • Data frame attributes are preserved.

  • Groups are maintained; you can't select off grouping variables.

Examples

# Here we show the usage for the basic selection operators. See the
# specific help pages to learn about helpers like [starts_with()].

# Select variables by name:
mtcars %>% select(mpg)

# Select multiple variables by separating them with commas. Note
# how the order of columns is determined by the order of inputs:
mtcars %>% select(disp, gear, am)

# Rename variables:
mtcars %>% select(MilesPerGallon = mpg, everything())

# The `:` operator selects a range of consecutive variables:
select(mtcars, mpg:cyl)

# The `!` operator negates a selection:
mtcars %>% select(!(mpg:qsec))
mtcars %>% select(!ends_with("p"))

# `&` and `|` take the intersection or the union of two selections:
iris %>% select(starts_with("Petal") & ends_with("Width"))
iris %>% select(starts_with("Petal") | ends_with("Width"))

# To take the difference between two selections, combine the `&` and
# `!` operators:
iris %>% select(starts_with("Petal") & !ends_with("Width"))


poorman documentation built on Nov. 2, 2023, 5:27 p.m.