Description Usage Arguments Details Value Useful filter functions Grouped tibbles Tidy data Scoped filtering Source See Also Examples
Use filter()
to choose rows/cases where conditions are true. Unlike
base subsetting with [
, rows where the condition evaluates to NA
are
dropped.
1 | filter_group_size(.data, min = NULL, max = min)
|
.data |
A tbl. All main verbs are S3 generics and provide methods
for |
min |
minimal group size, use |
max |
maximal group size, use |
Note that dplyr is not yet smart enough to optimise filtering optimisation
on grouped datasets that don't need grouped calculations. For this reason,
filtering is often considerably faster on ungroup()
ed data.
An object of the same class as .data
.
==
, >
, >=
etc
&
, |
, !
, xor()
is.na()
between()
, near()
Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:
1 |
With the grouped equivalent:
1 |
The former keeps rows with mass
greater than the global average
whereas the latter keeps rows with mass
greater than the gender
average.
It is valid to use grouping variables in filter expressions.
When applied on a grouped tibble, filter()
automatically rearranges
the tibble by groups for performance reasons.
When applied to a data frame, row names are silently dropped. To preserve,
convert to an explicit variable with tibble::rownames_to_column()
.
The three scoped variants (filter_all()
, filter_if()
and
filter_at()
) make it easy to apply a filtering condition to a
selection of variables.
Stack Overflow answer by docendo discimus, https://stackoverflow.com/a/43110620/4575331
filter_all()
, filter_if()
and filter_at()
.
Other single table verbs: arrange
,
mutate
, select
,
slice
, summarise
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | filter(starwars, species == "Human")
filter(starwars, mass > 1000)
# Multiple criteria
filter(starwars, hair_color == "none" & eye_color == "black")
filter(starwars, hair_color == "none" | eye_color == "black")
# Multiple arguments are equivalent to and
filter(starwars, hair_color == "none", eye_color == "black")
# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
filter(
.data[[vars[[1]]]] > cond[[1]],
.data[[vars[[2]]]] > cond[[2]]
)
# For more complex cases, knowledge of tidy evaluation and the
# unquote operator `!!` is required. See https://tidyeval.tidyverse.org/
#
# One useful and simple tidy eval technique is to use `!!` to bypass
# the data frame and its columns. Here is how to filter the columns
# `mass` and `height` relative to objects of the same names:
mass <- 80
height <- 150
filter(starwars, mass > !!mass, height > !!height)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.