summarise: Summarise each group to fewer rows

Description Usage Arguments Value Useful functions Backend variations Methods See Also Examples

View source: R/summarise.R

Description

summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

summarise() and summarize() are synonyms.

Usage

1
2
3
summarise(.data, ..., .groups = NULL)

summarize(.data, ..., .groups = NULL)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Name-value pairs of summary functions. The name will be the name of the variable in the result.

The value can be:

  • A vector of length 1, e.g. min(x), n(), or sum(is.na(y)).

  • A vector of length n, e.g. quantile().

  • A data frame, to add multiple columns from a single expression.

.groups \Sexpr[results=rd]{lifecycle::badge("experimental")}

Grouping structure of the result.

  • "drop_last": dropping the last level of grouping. This was the only supported option before version 1.0.0.

  • "drop": All levels of grouping are dropped.

  • "keep": Same grouping structure as .data.

  • "rowwise": Each row is its own group.

When .groups is not specified, it is chosen based on the number of rows of the results:

  • If all the results have 1 row, you get "drop_last".

  • If the number of rows varies, you get "keep".

In addition, a message informs you of that choice, unless the result is ungrouped, the option "dplyr.summarise.inform" is set to FALSE, or when summarise() is called from a function in a package.

Value

An object usually of the same type as .data.

Useful functions

Backend variations

The data frame backend supports creating a variable and using it in the same summary. This means that previously created summary variables can be further transformed or combined within the summary, as in mutate(). However, it also means that summary variables with the same names as previous variables overwrite them, making those variables unavailable to later summary variables.

This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.

Methods

This function is a generic, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages: \Sexpr[stage=render,results=rd]{dplyr:::methods_rd("summarise")}.

See Also

Other single table verbs: arrange(), filter(), mutate(), rename(), select(), slice()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# A summary applied to ungrouped tbl returns a single row
mtcars %>%
  summarise(mean = mean(disp), n = n())

# Usually, you'll want to group first
mtcars %>%
  group_by(cyl) %>%
  summarise(mean = mean(disp), n = n())

# dplyr 1.0.0 allows to summarise to more than one value:
mtcars %>%
   group_by(cyl) %>%
   summarise(qs = quantile(disp, c(0.25, 0.75)), prob = c(0.25, 0.75))

# You use a data frame to create multiple columns so you can wrap
# this up into a function:
my_quantile <- function(x, probs) {
  tibble(x = quantile(x, probs), probs = probs)
}
mtcars %>%
  group_by(cyl) %>%
  summarise(my_quantile(disp, c(0.25, 0.75)))

# Each summary call removes one grouping level (since that group
# is now just a single row)
mtcars %>%
  group_by(cyl, vs) %>%
  summarise(cyl_n = n()) %>%
  group_vars()

# BEWARE: reusing variables may lead to unexpected results
mtcars %>%
  group_by(cyl) %>%
  summarise(disp = mean(disp), sd = sd(disp))

# Refer to column names stored as strings with the `.data` pronoun:
var <- "mass"
summarise(starwars, avg = mean(.data[[var]], na.rm = TRUE))
# Learn more in ?dplyr_data_masking

Example output

Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

      mean  n
1 230.7219 32
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
    cyl  mean     n
  <dbl> <dbl> <int>
1     4  105.    11
2     6  183.     7
3     8  353.    14
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
# A tibble: 6 x 3
# Groups:   cyl [3]
    cyl    qs  prob
  <dbl> <dbl> <dbl>
1     4  78.8  0.25
2     4 121.   0.75
3     6 160    0.25
4     6 196.   0.75
5     8 302.   0.25
6     8 390    0.75
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
# A tibble: 6 x 3
# Groups:   cyl [3]
    cyl     x probs
  <dbl> <dbl> <dbl>
1     4  78.8  0.25
2     4 121.   0.75
3     6 160    0.25
4     6 196.   0.75
5     8 302.   0.25
6     8 390    0.75
`summarise()` regrouping output by 'cyl' (override with `.groups` argument)
[1] "cyl"
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
    cyl  disp    sd
  <dbl> <dbl> <dbl>
1     4  105.    NA
2     6  183.    NA
3     8  353.    NA
# A tibble: 1 x 1
    avg
  <dbl>
1  97.3

dplyr documentation built on June 19, 2021, 1:07 a.m.