nest-filter-joins: Nested filtering joins

nest-filter-joinsR Documentation

Nested filtering joins

Description

Nested filtering joins filter rows from .nest_data based on the presence or absence of matches in y:

  • nest_semi_join() returns all rows from .nest_data with a match in y.

  • nest_anti_join() returns all rows from .nest_data without a match in y.

Usage

nest_semi_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...)

nest_anti_join(.data, .nest_data, y, by = NULL, copy = FALSE, ...)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

y

A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).

by

A character vector of variables to join by or a join specification created with join_by().

If NULL, the default, nest_*_join() will perform a natural join, using all variables in common across each object in .nest_data and y. A message lists the variables so you can check they're correct; suppress the message by supplying by explicitly.

To join on different variables between the objects in .nest_data and y, use a named vector. For example, by = c("a" = "b") will match .nest_data$a to y$b for each object in .nest_data.

To join by multiple variables, use a vector with length >1. For example, by = c("a", "b") will match .nest_data$a to y$a and .nest_data$b to y$b for each object in .nest_data. Use a named vector to match different variables in .nest_data and y. For example, by = c("a" = "b", "c" = "d") will match .nest_data$a to y$b and .nest_data$c to y$d for each object in .nest_data.

To perform a cross-join, generating all combinations of each object in .nest_data and y, use by = character().

copy

If .nest_data and y are not from the same data source and copy = TRUE then y will be copied into the same src as .nest_data. (Need to review this parameter in more detail for applicability with nplyr)

...

One or more unquoted expressions separated by commas. Variable names can be used if they were positions in the data frame, so expressions like x:y can be used to select a range of variables.

Details

nest_semi_join() and nest_anti_join() are largely wrappers for dplyr::semi_join() and dplyr::anti_join() and maintain the functionality of semi_join() and anti_join() within each nested data frame. For more information on semi_join() or anti_join(), please refer to the documentation in dplyr.

Value

An object of the same type as .data. Each object in the column .nest_data will also be of the same type as the input. Each object in .nest_data has the following properties:

  • Rows are a subset of the input, but appear in the same order.

  • Columns are not modified.

  • Data frame attributes are preserved.

  • Groups are taken from .nest_data. The number of groups may be reduced.

See Also

Other joins: nest-mutate-joins, nest_nest_join()

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)
gm_codes <- gapminder::country_codes %>% dplyr::slice_sample(n = 10)

gm_nest %>% nest_semi_join(country_data, gm_codes, by = "country")
gm_nest %>% nest_anti_join(country_data, gm_codes, by = "country")


nplyr documentation built on Feb. 16, 2023, 7:24 p.m.