filter_group_size: Return rows with matching conditions

Description Usage Arguments Details Value Useful filter functions Grouped tibbles Tidy data Scoped filtering Source See Also Examples

Description

Use filter() to choose rows/cases where conditions are true. Unlike base subsetting with [, rows where the condition evaluates to NA are dropped.

Usage

1

Arguments

.data

A tbl. All main verbs are S3 generics and provide methods for tbl_df(), dtplyr::tbl_dt() and dbplyr::tbl_dbi().

min

minimal group size, use min = NULL to filter on maximal group size only

max

maximal group size, use max = NULL to filter on minimal group size only

Details

Note that dplyr is not yet smart enough to optimise filtering optimisation on grouped datasets that don't need grouped calculations. For this reason, filtering is often considerably faster on ungroup()ed data.

Value

An object of the same class as .data.

Useful filter functions

Grouped tibbles

Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:

1
starwars %>% filter(mass > mean(mass, na.rm = TRUE))

With the grouped equivalent:

1
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))

The former keeps rows with mass greater than the global average whereas the latter keeps rows with mass greater than the gender average.

It is valid to use grouping variables in filter expressions.

When applied on a grouped tibble, filter() automatically rearranges the tibble by groups for performance reasons.

Tidy data

When applied to a data frame, row names are silently dropped. To preserve, convert to an explicit variable with tibble::rownames_to_column().

Scoped filtering

The three scoped variants (filter_all(), filter_if() and filter_at()) make it easy to apply a filtering condition to a selection of variables.

Source

Stack Overflow answer by docendo discimus, https://stackoverflow.com/a/43110620/4575331

See Also

filter_all(), filter_if() and filter_at().

Other single table verbs: arrange, mutate, select, slice, summarise

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
filter(starwars, species == "Human")
filter(starwars, mass > 1000)

# Multiple criteria
filter(starwars, hair_color == "none" & eye_color == "black")
filter(starwars, hair_color == "none" | eye_color == "black")

# Multiple arguments are equivalent to and
filter(starwars, hair_color == "none", eye_color == "black")


# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))

# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))


# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
  filter(
    .data[[vars[[1]]]] > cond[[1]],
    .data[[vars[[2]]]] > cond[[2]]
  )

# For more complex cases, knowledge of tidy evaluation and the
# unquote operator `!!` is required. See https://tidyeval.tidyverse.org/
#
# One useful and simple tidy eval technique is to use `!!` to bypass
# the data frame and its columns. Here is how to filter the columns
# `mass` and `height` relative to objects of the same names:
mass <- 80
height <- 150
filter(starwars, mass > !!mass, height > !!height)

msberends/certedata documentation built on Nov. 26, 2019, 5:19 a.m.