filter_group_size: Return rows with matching conditions
In msberends/certedata: Tools for Data Analysis at Certe

Description Usage Arguments Details Value Useful filter functions Grouped tibbles Tidy data Scoped filtering Source See Also Examples

Use filter() to choose rows/cases where conditions are true. Unlike base subsetting with [, rows where the condition evaluates to NA are dropped.

1	filter_group_size(.data, min = NULL, max = min)

`.data`	A tbl. All main verbs are S3 generics and provide methods for `tbl_df()`, `dtplyr::tbl_dt()` and `dbplyr::tbl_dbi()`.
`min`	minimal group size, use `min = NULL` to filter on maximal group size only
`max`	maximal group size, use `max = NULL` to filter on minimal group size only

Note that dplyr is not yet smart enough to optimise filtering optimisation on grouped datasets that don't need grouped calculations. For this reason, filtering is often considerably faster on ungroup()ed data.

An object of the same class as .data.

==, >, >= etc
&, |, !, xor()
is.na()
between(), near()

Because filtering expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped filtering:

1	starwars %>% filter(mass > mean(mass, na.rm = TRUE))

With the grouped equivalent:

1	starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))

The former keeps rows with mass greater than the global average whereas the latter keeps rows with mass greater than the gender average.

It is valid to use grouping variables in filter expressions.

When applied on a grouped tibble, filter() automatically rearranges the tibble by groups for performance reasons.

When applied to a data frame, row names are silently dropped. To preserve, convert to an explicit variable with tibble::rownames_to_column().

The three scoped variants (filter_all(), filter_if() and filter_at()) make it easy to apply a filtering condition to a selection of variables.

Stack Overflow answer by docendo discimus, https://stackoverflow.com/a/43110620/4575331

filter_all(), filter_if() and filter_at().

Other single table verbs: arrange, mutate, select, slice, summarise

filter(starwars, species == "Human")
filter(starwars, mass > 1000)

# Multiple criteria
filter(starwars, hair_color == "none" & eye_color == "black")
filter(starwars, hair_color == "none" | eye_color == "black")

# Multiple arguments are equivalent to and
filter(starwars, hair_color == "none", eye_color == "black")


# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))

# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))


# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
  filter(
    .data[[vars[[1]]]] > cond[[1]],
    .data[[vars[[2]]]] > cond[[2]]
  )

# For more complex cases, knowledge of tidy evaluation and the
# unquote operator `!!` is required. See https://tidyeval.tidyverse.org/
#
# One useful and simple tidy eval technique is to use `!!` to bypass
# the data frame and its columns. Here is how to filter the columns
# `mass` and `height` relative to objects of the same names:
mass <- 80
height <- 150
filter(starwars, mass > !!mass, height > !!height)