summarise: Reduce multiple values down to a single value

Description Usage Arguments Details Value Useful functions Backend variations Tidy data See Also Examples

Description

Create one or more scalar variables summarizing the variables of an existing tbl. Tbls with groups created by group_by() will result in one row in the output for each group. Tbls with no groups will result in one row.

Usage

1
2
3

Arguments

.data

A tbl. All main verbs are S3 generics and provide methods for tbl_df(), dtplyr::tbl_dt() and dbplyr::tbl_dbi().

...

Name-value pairs of summary functions. The name will be the name of the variable in the result. The value should be an expression that returns a single value like min(x), n(), or sum(is.na(y)).

The arguments in ... are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See vignette("programming") for an introduction to these concepts.

Details

summarise() and summarize() are synonyms.

Value

An object of the same class as .data. One grouping level will be dropped.

Useful functions

Backend variations

The data frame backend supports creating a variable and using it in the same summary. This means that previously created summary variables can be further transformed or combined within the summary, as in mutate(). However, it also means that summary variables with the same names as previous variables overwrite them, making those variables unavailable to later summary variables.

This behaviour may not be supported in other backends. To avoid unexpected results, consider using new names for your summary variables, especially when creating multiple summaries.

Tidy data

When applied to a data frame, row names are silently dropped. To preserve, convert to an explicit variable with tibble::rownames_to_column().

See Also

Other single table verbs: arrange, filter, mutate, select, slice

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# A summary applied to ungrouped tbl returns a single row
mtcars %>%
  summarise(mean = mean(disp), n = n())

# Usually, you'll want to group first
mtcars %>%
  group_by(cyl) %>%
  summarise(mean = mean(disp), n = n())

# Each summary call removes one grouping level (since that group
# is now just a single row)
mtcars %>%
  group_by(cyl, vs) %>%
  summarise(cyl_n = n()) %>%
  group_vars()


# Reusing variable names when summarising may lead to unexpected results
mtcars %>%
  group_by(cyl) %>%
  summarise(disp = mean(disp), sd = sd(disp), double_disp = disp * 2)


# Refer to column names stored as strings with the `.data` pronoun:
var <- "mass"
summarise(starwars, avg = mean(.data[[var]], na.rm = TRUE))

# For more complex cases, knowledge of tidy evaluation and the
# unquote operator `!!` is required. See https://tidyeval.tidyverse.org/
#
# One useful and simple tidy eval technique is to use `!!` to
# bypass the data frame and its columns. Here is how to divide the
# column `mass` by an object of the same name:
mass <- 100
summarise(starwars, avg = mean(mass / !!mass, na.rm = TRUE))

Example output

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

      mean  n
1 230.7219 32
# A tibble: 3 x 3
    cyl  mean     n
  <dbl> <dbl> <int>
1     4  105.    11
2     6  183.     7
3     8  353.    14
[1] "cyl"
# A tibble: 3 x 4
    cyl  disp    sd double_disp
  <dbl> <dbl> <dbl>       <dbl>
1     4  105.    NA        210.
2     6  183.    NA        367.
3     8  353.    NA        706.
# A tibble: 1 x 1
    avg
  <dbl>
1  97.3
# A tibble: 1 x 1
    avg
  <dbl>
1 0.973

dplyr documentation built on July 4, 2019, 5:08 p.m.