summarise_all: Summarise multiple columns

Description Usage Arguments Value Grouping variables Naming See Also Examples

Description

The scoped variants of summarise() make it easy to apply the same transformation to multiple variables. There are three variants.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
summarise_all(.tbl, .funs, ...)

summarise_if(.tbl, .predicate, .funs, ...)

summarise_at(.tbl, .vars, .funs, ..., .cols = NULL)

summarize_all(.tbl, .funs, ...)

summarize_if(.tbl, .predicate, .funs, ...)

summarize_at(.tbl, .vars, .funs, ..., .cols = NULL)

Arguments

.tbl

A tbl object.

.funs

A function fun, a quosure style lambda ~ fun(.) or a list of either form.

...

Additional arguments for the function calls in .funs. These are evaluated only once, with tidy dots support.

.predicate

A predicate function to be applied to the columns or a logical vector. The variables for which .predicate is or returns TRUE are selected. This argument is passed to rlang::as_function() and thus supports quosure-style lambda functions and strings representing function names.

.vars

A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions, or NULL.

.cols

This argument has been renamed to .vars to fit dplyr's terminology and is deprecated.

Value

A data frame. By default, the newly created columns have the shortest names needed to uniquely identify the output. To force inclusion of a name, even when not needed, name the input (see examples for details).

Grouping variables

If applied on a grouped tibble, these operations are not applied to the grouping variables. The behaviour depends on whether the selection is implicit (all and if selections) or explicit (at selections).

Naming

The names of the created columns is derived from the names of the input variables and the names of the functions.

The names of the functions here means the names of the list of functions that is supplied. When needed and not supplied, the name of a function is the prefix "fn" followed by the index of this function within the unnamed functions in the list. Ultimately, names are made unique.

See Also

The other scoped verbs, vars()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
by_species <- iris %>%
  group_by(Species)


# The _at() variants directly support strings:
starwars %>%
  summarise_at(c("height", "mass"), mean, na.rm = TRUE)

# You can also supply selection helpers to _at() functions but you have
# to quote them with vars():
starwars %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)

# The _if() variants apply a predicate function (a function that
# returns TRUE or FALSE) to determine the relevant subset of
# columns. Here we apply mean() to the numeric columns:
starwars %>%
  summarise_if(is.numeric, mean, na.rm = TRUE)

# If you want to apply multiple transformations, pass a list of
# functions. When there are multiple functions, they create new
# variables instead of modifying the variables in place:
by_species %>%
  summarise_all(list(min, max))

# Note how the new variables include the function name, in order to
# keep things distinct. Passing purrr-style lambdas often creates
# better default names:
by_species %>%
  summarise_all(list(~min(.), ~max(.)))

# When that's not good enough, you can also supply the names explicitly:
by_species %>%
  summarise_all(list(min = min, max = max))

# When there's only one function in the list, it modifies existing
# variables in place. Give it a name to create new variables instead:
by_species %>% summarise_all(list(med = median))
by_species %>% summarise_all(list(Q3 = quantile), probs = 0.75)

Example output

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

# A tibble: 1 x 2
  height  mass
   <dbl> <dbl>
1   174.  97.3
# A tibble: 1 x 2
  height  mass
   <dbl> <dbl>
1   174.  97.3
# A tibble: 1 x 3
  height  mass birth_year
   <dbl> <dbl>      <dbl>
1   174.  97.3       87.6
# A tibble: 3 x 9
  Species `Sepal.Length_.~ `Sepal.Width_.P~ `Petal.Length_.~ `Petal.Width_.P~
  <fct>              <dbl>            <dbl>            <dbl>            <dbl>
1 setosa               4.3              2.3              1                0.1
2 versic~              4.9              2                3                1  
3 virgin~              4.9              2.2              4.5              1.4
# ... with 4 more variables: `Sepal.Length_.Primitive("max")` <dbl>,
#   `Sepal.Width_.Primitive("max")` <dbl>,
#   `Petal.Length_.Primitive("max")` <dbl>,
#   `Petal.Width_.Primitive("max")` <dbl>
# A tibble: 3 x 9
  Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min
  <fct>              <dbl>           <dbl>            <dbl>           <dbl>
1 setosa               4.3             2.3              1               0.1
2 versic~              4.9             2                3               1  
3 virgin~              4.9             2.2              4.5             1.4
# ... with 4 more variables: Sepal.Length_max <dbl>, Sepal.Width_max <dbl>,
#   Petal.Length_max <dbl>, Petal.Width_max <dbl>
# A tibble: 3 x 9
  Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min
  <fct>              <dbl>           <dbl>            <dbl>           <dbl>
1 setosa               4.3             2.3              1               0.1
2 versic~              4.9             2                3               1  
3 virgin~              4.9             2.2              4.5             1.4
# ... with 4 more variables: Sepal.Length_max <dbl>, Sepal.Width_max <dbl>,
#   Petal.Length_max <dbl>, Petal.Width_max <dbl>
# A tibble: 3 x 5
  Species    Sepal.Length_med Sepal.Width_med Petal.Length_med Petal.Width_med
  <fct>                 <dbl>           <dbl>            <dbl>           <dbl>
1 setosa                  5               3.4             1.5              0.2
2 versicolor              5.9             2.8             4.35             1.3
3 virginica               6.5             3               5.55             2  
# A tibble: 3 x 5
  Species    Sepal.Length_Q3 Sepal.Width_Q3 Petal.Length_Q3 Petal.Width_Q3
  <fct>                <dbl>          <dbl>           <dbl>          <dbl>
1 setosa                 5.2           3.68            1.58            0.3
2 versicolor             6.3           3               4.6             1.5
3 virginica              6.9           3.18            5.88            2.3

dplyr documentation built on July 4, 2019, 5:08 p.m.