summarise_sub: Add columns from nested data frames

View source: R/nested_df.R

summarise_subR Documentation

Add columns from nested data frames

Description

Sometimes, when one is working with data frames that have data frames nested within them (see tibble-package or nest), one will want to extract summary statistics or key aspects of information from the embedded data frames and move them to columns in the top level. This function applies summary functions to the nested data frames and pulls them out into columns of the higher-level data frame.

Usage

summarise_sub(df, data_col_name, ..., handle_nulls = FALSE, scoped_in = TRUE)

Arguments

df

A data frame

data_col_name

The column name of the nested data frames, bare or as a string.

...

the name-value pairs of summary functions (see summarise for more information)

handle_nulls

A boolean indicating whether rows with NULL values for the nested column should throw an error (FALSE) or should have NAs in the new columns.

scoped_in

A boolean indicating whether the summary functions are scoped within the nested data frames alone (TRUE) or whether they also have access to the higher-level data frame. Changing this value can radically change the behavior.

Value

A data frame / tibble

Examples

d <- mtcars %>%
  dplyr::mutate(Name=row.names(mtcars)) %>%
  as_tibble() %>%
  tidyr::nest(-cyl)

d %>%
  summarise_sub(data, mean_mpg = mean(mpg),
                     sd_hp = sd(hp),
                     n=n())

# Here we can see that if we set `scoped_in` to `FALSE`, `n()` will access the number of rows of the higher-level data frame instead of the nested ones. This could be useful in some circumstances, I just can't think of any.
d %>%
  summarise_sub(data, n=n(), scoped_in = FALSE)

# If there's a NULL value in the nested column, by default it will throw an error
# If `handle_nulls` is `TRUE`, then rows with NULL values will return NAs
d[2,]$data <- list(NULL)
## Not run: 
d %>% summarise_sub(data, mean_mpg = mean(mpg), n=n())

## End(Not run)
d %>% summarise_sub(data, mean_mpg = mean(mpg), n=n(), handle_nulls = TRUE)

burchill/zplyr documentation built on Feb. 2, 2023, 11:01 a.m.