group_map: Apply a function to each group

View source: R/group-map.R

group_mapR Documentation

Apply a function to each group

Description

[Experimental]

group_map(), group_modify() and group_walk() are purrr-style functions that can be used to iterate on grouped tibbles.

Usage

group_map(.data, .f, ..., .keep = FALSE)

group_modify(.data, .f, ..., .keep = FALSE)

group_walk(.data, .f, ..., .keep = FALSE)

Arguments

.data

A grouped tibble

.f

A function or formula to apply to each group.

If a function, it is used as is. It should have at least 2 formal arguments.

If a formula, e.g. ~ head(.x), it is converted to a function.

In the formula, you can use

  • . or .x to refer to the subset of rows of .tbl for the given group

  • .y to refer to the key, a one row tibble with one column per grouping variable that identifies the group

...

Additional arguments passed on to .f

.keep

are the grouping variables kept in .x

Details

Use group_modify() when summarize() is too limited, in terms of what you need to do and return for each group. group_modify() is good for "data frame in, data frame out". If that is too limited, you need to use a nested or split workflow. group_modify() is an evolution of do(), if you have used that before.

Each conceptual group of the data frame is exposed to the function .f with two pieces of information:

  • The subset of the data for the group, exposed as .x.

  • The key, a tibble with exactly one row and columns for each grouping variable, exposed as .y.

For completeness, group_modify(), group_map and group_walk() also work on ungrouped data frames, in that case the function is applied to the entire data frame (exposed as .x), and .y is a one row tibble with no column, consistently with group_keys().

Value

  • group_modify() returns a grouped tibble. In that case .f must return a data frame.

  • group_map() returns a list of results from calling .f on each group.

  • group_walk() calls .f for side effects and returns the input .tbl, invisibly.

See Also

Other grouping functions: group_by(), group_nest(), group_split(), group_trim()

Examples


# return a list
mtcars %>%
  group_by(cyl) %>%
  group_map(~ head(.x, 2L))

# return a tibble grouped by `cyl` with 2 rows per group
# the grouping data is recalculated
mtcars %>%
  group_by(cyl) %>%
  group_modify(~ head(.x, 2L))


# a list of tibbles
iris %>%
  group_by(Species) %>%
  group_map(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))

# a restructured grouped tibble
iris %>%
  group_by(Species) %>%
  group_modify(~ broom::tidy(lm(Petal.Length ~ Sepal.Length, data = .x)))


# a list of vectors
iris %>%
  group_by(Species) %>%
  group_map(~ quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)))

# to use group_modify() the lambda must return a data frame
iris %>%
  group_by(Species) %>%
  group_modify(~ {
     quantile(.x$Petal.Length, probs = c(0.25, 0.5, 0.75)) %>%
     tibble::enframe(name = "prob", value = "quantile")
  })

iris %>%
  group_by(Species) %>%
  group_modify(~ {
    .x %>%
      purrr::map_dfc(fivenum) %>%
      mutate(nms = c("min", "Q1", "median", "Q3", "max"))
  })

# group_walk() is for side effects
dir.create(temp <- tempfile())
iris %>%
  group_by(Species) %>%
  group_walk(~ write.csv(.x, file = file.path(temp, paste0(.y$Species, ".csv"))))
list.files(temp, pattern = "csv$")
unlink(temp, recursive = TRUE)

# group_modify() and ungrouped data frames
mtcars %>%
  group_modify(~ head(.x, 2L))


hadley/dplyr documentation built on Feb. 16, 2024, 8:27 p.m.