summarise_all: Summarise multiple columns

View source: R/colwise-mutate.R

summarise_allR Documentation

Summarise multiple columns

Description

[Superseded]

Scoped verbs (⁠_if⁠, ⁠_at⁠, ⁠_all⁠) have been superseded by the use of pick() or across() in an existing verb. See vignette("colwise") for details.

The scoped variants of summarise() make it easy to apply the same transformation to multiple variables. There are three variants.

  • summarise_all() affects every variable

  • summarise_at() affects variables selected with a character vector or vars()

  • summarise_if() affects variables selected with a predicate function

Usage

summarise_all(.tbl, .funs, ...)

summarise_if(.tbl, .predicate, .funs, ...)

summarise_at(.tbl, .vars, .funs, ..., .cols = NULL)

summarize_all(.tbl, .funs, ...)

summarize_if(.tbl, .predicate, .funs, ...)

summarize_at(.tbl, .vars, .funs, ..., .cols = NULL)

Arguments

.tbl

A tbl object.

.funs

A function fun, a quosure style lambda ~ fun(.) or a list of either form.

...

Additional arguments for the function calls in .funs. These are evaluated only once, with tidy dots support.

.predicate

A predicate function to be applied to the columns or a logical vector. The variables for which .predicate is or returns TRUE are selected. This argument is passed to rlang::as_function() and thus supports quosure-style lambda functions and strings representing function names.

.vars

A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions, or NULL.

.cols

This argument has been renamed to .vars to fit dplyr's terminology and is deprecated.

Value

A data frame. By default, the newly created columns have the shortest names needed to uniquely identify the output. To force inclusion of a name, even when not needed, name the input (see examples for details).

Grouping variables

If applied on a grouped tibble, these operations are not applied to the grouping variables. The behaviour depends on whether the selection is implicit (all and if selections) or explicit (at selections).

  • Grouping variables covered by explicit selections in summarise_at() are always an error. Add -group_cols() to the vars() selection to avoid this:

    data %>%
      summarise_at(vars(-group_cols(), ...), myoperation)
    

    Or remove group_vars() from the character vector of column names:

    nms <- setdiff(nms, group_vars(data))
    data %>% summarise_at(nms, myoperation)
    
  • Grouping variables covered by implicit selections are silently ignored by summarise_all() and summarise_if().

Naming

The names of the new columns are derived from the names of the input variables and the names of the functions.

  • if there is only one unnamed function (i.e. if .funs is an unnamed list of length one), the names of the input variables are used to name the new columns;

  • for ⁠_at⁠ functions, if there is only one unnamed variable (i.e., if .vars is of the form vars(a_single_column)) and .funs has length greater than one, the names of the functions are used to name the new columns;

  • otherwise, the new names are created by concatenating the names of the input variables and the names of the functions, separated with an underscore "_".

The .funs argument can be a named or unnamed list. If a function is unnamed and the name cannot be derived automatically, a name of the form "fn#" is used. Similarly, vars() accepts named and unnamed arguments. If a variable in .vars is named, a new column by that name will be created.

Name collisions in the new columns are disambiguated using a unique suffix.

See Also

The other scoped verbs, vars()

Examples

# The _at() variants directly support strings:
starwars %>%
  summarise_at(c("height", "mass"), mean, na.rm = TRUE)
# ->
starwars %>% summarise(across(c("height", "mass"), ~ mean(.x, na.rm = TRUE)))

# You can also supply selection helpers to _at() functions but you have
# to quote them with vars():
starwars %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)
# ->
starwars %>%
  summarise(across(height:mass, ~ mean(.x, na.rm = TRUE)))

# The _if() variants apply a predicate function (a function that
# returns TRUE or FALSE) to determine the relevant subset of
# columns. Here we apply mean() to the numeric columns:
starwars %>%
  summarise_if(is.numeric, mean, na.rm = TRUE)
starwars %>%
  summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE)))

by_species <- iris %>%
  group_by(Species)

# If you want to apply multiple transformations, pass a list of
# functions. When there are multiple functions, they create new
# variables instead of modifying the variables in place:
by_species %>%
  summarise_all(list(min, max))
# ->
by_species %>%
  summarise(across(everything(), list(min = min, max = max)))

hadley/dplyr documentation built on Nov. 6, 2024, 4:48 p.m.