gather: Gather columns into key-value pairs.

Description Usage Arguments Rules for selection Examples

View source: R/gather.R

Description

Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that you have columns that are not variables.

Usage

1
2
gather(data, key = "key", value = "value", ..., na.rm = FALSE,
  convert = FALSE, factor_key = FALSE)

Arguments

data

A data frame.

key, value

Names of new key and value columns, as strings or symbols.

This argument is passed by expression and supports quasiquotation (you can unquote strings and symbols). The name is captured from the expression with rlang::ensym() (note that this kind of interface where symbols do not represent actual objects is now discouraged in the tidyverse; we support it here for backward compatibility).

...

A selection of columns. If empty, all variables are selected. You can supply bare variable names, select all variables between x and z with x:z, exclude y with -y. For more options, see the dplyr::select() documentation. See also the section on selection rules below.

na.rm

If TRUE, will remove rows from output where the value column in NA.

convert

If TRUE will automatically run type.convert() on the key column. This is useful if the column types are actually numeric, integer, or logical.

factor_key

If FALSE, the default, the key values will be stored as a character vector. If TRUE, will be stored as a factor, which preserves the original ordering of the columns.

Rules for selection

Arguments for selecting columns are passed to tidyselect::vars_select() and are treated specially. Unlike other verbs, selecting functions make a strict distinction between data expressions and context expressions.

For instance, col1:col3 is a data expression that refers to data columns, while seq(start, end) is a context expression that refers to objects from the contexts.

If you really need to refer to contextual objects from a data expression, you can unquote them with the tidy eval operator !!. This operator evaluates its argument in the context and inlines the result in the surrounding function call. For instance, c(x, !! x) selects the x column within the data frame and the column referred to by the object x defined in the context (which can contain either a column name as string or a column position).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
library(dplyr)
# From http://stackoverflow.com/questions/1181060
stocks <- tibble(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

gather(stocks, stock, price, -time)
stocks %>% gather(stock, price, -time)

# get first observation for each Species in iris data -- base R
mini_iris <- iris[c(1, 51, 101), ]
# gather Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
gather(mini_iris, key = flower_att, value = measurement,
       Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
# same result but less verbose
gather(mini_iris, key = flower_att, value = measurement, -Species)

# repeat iris example using dplyr and the pipe operator
library(dplyr)
mini_iris <-
  iris %>%
  group_by(Species) %>%
  slice(1)
mini_iris %>% gather(key = flower_att, value = measurement, -Species)

Example output

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

# A tibble: 30 x 3
         time stock       price
       <date> <chr>       <dbl>
 1 2009-01-01     X  0.56619297
 2 2009-01-02     X -0.48609203
 3 2009-01-03     X -2.70085160
 4 2009-01-04     X -0.90873269
 5 2009-01-05     X -0.36907848
 6 2009-01-06     X  0.67994428
 7 2009-01-07     X -2.10728309
 8 2009-01-08     X -1.87210772
 9 2009-01-09     X  0.85427200
10 2009-01-10     X  0.02745882
# ... with 20 more rows
# A tibble: 30 x 3
         time stock       price
       <date> <chr>       <dbl>
 1 2009-01-01     X  0.56619297
 2 2009-01-02     X -0.48609203
 3 2009-01-03     X -2.70085160
 4 2009-01-04     X -0.90873269
 5 2009-01-05     X -0.36907848
 6 2009-01-06     X  0.67994428
 7 2009-01-07     X -2.10728309
 8 2009-01-08     X -1.87210772
 9 2009-01-09     X  0.85427200
10 2009-01-10     X  0.02745882
# ... with 20 more rows
      Species   flower_att measurement
1      setosa Sepal.Length         5.1
2  versicolor Sepal.Length         7.0
3   virginica Sepal.Length         6.3
4      setosa  Sepal.Width         3.5
5  versicolor  Sepal.Width         3.2
6   virginica  Sepal.Width         3.3
7      setosa Petal.Length         1.4
8  versicolor Petal.Length         4.7
9   virginica Petal.Length         6.0
10     setosa  Petal.Width         0.2
11 versicolor  Petal.Width         1.4
12  virginica  Petal.Width         2.5
      Species   flower_att measurement
1      setosa Sepal.Length         5.1
2  versicolor Sepal.Length         7.0
3   virginica Sepal.Length         6.3
4      setosa  Sepal.Width         3.5
5  versicolor  Sepal.Width         3.2
6   virginica  Sepal.Width         3.3
7      setosa Petal.Length         1.4
8  versicolor Petal.Length         4.7
9   virginica Petal.Length         6.0
10     setosa  Petal.Width         0.2
11 versicolor  Petal.Width         1.4
12  virginica  Petal.Width         2.5
# A tibble: 12 x 3
# Groups:   Species [3]
      Species   flower_att measurement
       <fctr>        <chr>       <dbl>
 1     setosa Sepal.Length         5.1
 2 versicolor Sepal.Length         7.0
 3  virginica Sepal.Length         6.3
 4     setosa  Sepal.Width         3.5
 5 versicolor  Sepal.Width         3.2
 6  virginica  Sepal.Width         3.3
 7     setosa Petal.Length         1.4
 8 versicolor Petal.Length         4.7
 9  virginica Petal.Length         6.0
10     setosa  Petal.Width         0.2
11 versicolor  Petal.Width         1.4
12  virginica  Petal.Width         2.5

tidyr documentation built on Oct. 29, 2018, 1:04 a.m.