pack: Pack and unpack
In tidyr: Tidy Messy Data

View source: R/pack.R

pack	R Documentation

Pack and unpack

Description

Packing and unpacking preserve the length of a data frame, changing its width. pack() makes df narrow by collapsing a set of columns into a single df-column. unpack() makes data wider by expanding df-columns back out into individual columns.

Usage

pack(.data, ..., .names_sep = NULL, .error_call = current_env())

unpack(
  data,
  cols,
  ...,
  names_sep = NULL,
  names_repair = "check_unique",
  error_call = current_env()
)

Arguments

`...`	For `pack()`, <`tidy-select`> columns to pack, specified using name-variable pairs of the form `new_col = c(col1, col2, col3)`. The right hand side can be any valid tidy select expression. For `unpack()`, these dots are for future extensions and must be empty.
`data`, `.data`	A data frame.
`cols`	<`tidy-select`> Columns to unpack.
`names_sep`, `.names_sep`	If `NULL`, the default, the names will be left as is. In `pack()`, inner names will come from the former outer names; in `unpack()`, the new outer names will come from the inner names. If a string, the inner and outer names will be used together. In `unpack()`, the names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by `names_sep`. In `pack()`, the new inner names will have the outer names + `names_sep` automatically stripped. This makes `names_sep` roughly symmetric between packing and unpacking.
`names_repair`	Used to check that output data frame has valid names. Must be one of the following options: `⁠"minimal⁠`": no name repair or checks, beyond basic existence, `⁠"unique⁠`": make sure names are unique and not empty, `⁠"check_unique⁠`": (the default), no name repair, but check they are unique, `⁠"universal⁠`": make the names unique and syntactic a function: apply custom name repair. tidyr_legacy: use the name repair from tidyr 0.8. a formula: a purrr-style anonymous function (see `rlang::as_function()`) See `vctrs::vec_as_names()` for more details on these terms and the strategies used to enforce them.
`error_call`, `.error_call`	The execution environment of a currently running function, e.g. `caller_env()`. The function will be mentioned in error messages as the source of the error. See the `call` argument of `abort()` for more information.

Details

Generally, unpacking is more useful than packing because it simplifies a complex data structure. Currently, few functions work with df-cols, and they are mostly a curiosity, but seem worth exploring further because they mimic the nested column headers that are so popular in Excel.

Examples

# Packing -------------------------------------------------------------------
# It's not currently clear why you would ever want to pack columns
# since few functions work with this sort of data.
df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3)
df
df %>% pack(x = starts_with("x"))
df %>% pack(x = c(x1, x2, x3), y = y)

# .names_sep allows you to strip off common prefixes; this
# acts as a natural inverse to name_sep in unpack()
iris %>%
  as_tibble() %>%
  pack(
    Sepal = starts_with("Sepal"),
    Petal = starts_with("Petal"),
    .names_sep = "."
  )

# Unpacking -----------------------------------------------------------------
df <- tibble(
  x = 1:3,
  y = tibble(a = 1:3, b = 3:1),
  z = tibble(X = c("a", "b", "c"), Y = runif(3), Z = c(TRUE, FALSE, NA))
)
df
df %>% unpack(y)
df %>% unpack(c(y, z))
df %>% unpack(c(y, z), names_sep = "_")

tidyr documentation built on June 24, 2024, 5:14 p.m.