tidyr-methods: unnest

Description Arguments Details Value See Also Examples

Description

Given a regular expression with capturing groups, 'extract()' turns each group into a new column. If the groups don't match, or the input is NA, the output will be NA.

\Sexpr[results=rd, stage=render]{lifecycle::badge("maturing")}

'pivot_longer()' "lengthens" data, increasing the number of rows and decreasing the number of columns. The inverse transformation is [pivot_wider()]

Learn more in 'vignette("pivot")'.

Convenience function to paste together multiple columns into one.

Given either a regular expression or a vector of character positions, 'separate()' turns a single character column into multiple columns.

Arguments

names_sep

If 'NULL', the default, the names will be left as is. In 'nest()', inner names will come from the former outer names; in 'unnest()', the new outer names will come from the inner names.

If a string, the inner and outer names will be used together. In 'nest()', the names of the new outer columns will be formed by pasting together the outer and the inner column names, separated by 'names_sep'. In 'unnest()', the new inner names will have the outer names (+ 'names_sep') automatically stripped. This makes 'names_sep' roughly symmetric between nesting and unnesting.

keep_empty

See tidyr::unnest

ptype

See tidyr::unnest

.drop

See tidyr::unnest

.id

tidyr::unnest

.preserve

See tidyr::unnest

.data

A tbl. (See tidyr)

.names_sep

See ?tidyr::nest

into

Names of new variables to create as character vector. Use 'NA' to omit the variable in the output.

regex

a regular expression used to extract the desired values. There should be one group (defined by '()') for each element of 'into'.

convert

If 'TRUE', will run [type.convert()] with 'as.is=TRUE' on new columns. This is useful if the component columns are integer, numeric or logical.

NB: this will cause string '"NA"'s to be converted to 'NA's.

cols

<['tidy-select'][tidyr_tidy_select]> Columns to pivot into longer format.

names_to

A string specifying the name of the column to create from the data stored in the column names of 'data'.

Can be a character vector, creating multiple columns, if 'names_sep' or 'names_pattern' is provided. In this case, there are two special values you can take advantage of:

* 'NA' will discard that component of the name. * '.value' indicates that component of the name defines the name of the column containing the cell values, overriding 'values_to'.

names_prefix

A regular expression used to remove matching text from the start of each variable name.

names_sep, names_pattern

If 'names_to' contains multiple values, these arguments control how the column name is broken up.

'names_sep' takes the same specification as [separate()], and can either be a numeric vector (specifying positions to break on), or a single string (specifying a regular expression to split on).

'names_pattern' takes the same specification as [extract()], a regular expression containing matching groups ('()').

If these arguments do not give you enough control, use 'pivot_longer_spec()' to create a spec object and process manually as needed.

names_repair

What happens if the output has invalid column names? The default, '"check_unique"' is to error if the columns are duplicated. Use '"minimal"' to allow duplicates in the output, or '"unique"' to de-duplicated by adding numeric suffixes. See [vctrs::vec_as_names()] for more options.

values_to

A string specifying the name of the column to create from the data stored in cell values. If 'names_to' is a character containing the special '.value' sentinel, this value will be ignored, and the name of the value column will be derived from part of the existing column names.

values_drop_na

If 'TRUE', will drop rows that contain only 'NA's in the 'value_to' column. This effectively converts explicit missing values to implicit missing values, and should generally be used only when missing values in 'data' were created by its structure.

names_transform, values_transform

A list of column name-function pairs. Use these arguments if you need to change the type of specific columns. For example, 'names_transform=list(week=as.integer)' would convert a character week variable to an integer.

names_ptypes, values_ptypes

A list of column name-prototype pairs. A prototype (or ptype for short) is a zero-length vector (like 'integer()' or 'numeric()') that defines the type, class, and attributes of a vector. Use these arguments to confirm that the created columns are the types that you expect.

If not specified, the type of the columns generated from 'names_to' will be character, and the type of the variables generated from 'values_to' will be the common type of the input columns used to generate them.

data

A data frame.

col

The name of the new column, as a string or symbol.

This argument is passed by expression and supports [quasiquotation][rlang::quasiquotation] (you can unquote strings and symbols). The name is captured from the expression with [rlang::ensym()] (note that this kind of interface where symbols do not represent actual objects is now discouraged in the tidyverse; we support it here for backward compatibility).

...

<['tidy-select'][tidyr_tidy_select]> Columns to unite

na.rm

If 'TRUE', missing values will be remove prior to uniting each value.

remove

If 'TRUE', remove input columns from output data frame.

sep

Separator between columns.

If character, 'sep' is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

If numeric, 'sep' is interpreted as character positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string. The length of 'sep' should be one less than 'into'.

extra

If 'sep' is a character vector, this controls what happens when there are too many pieces. There are three valid options:

* "warn" (the default): emit a warning and drop extra values. * "drop": drop any extra values without a warning. * "merge": only splits at most 'length(into)' times

fill

If 'sep' is a character vector, this controls what happens when there are not enough pieces. There are three valid options:

* "warn" (the default): emit a warning and fill from the right * "right": fill with missing values on the right * "left": fill with missing values on the left

Details

'pivot_longer()' is an updated approach to [gather()], designed to be both simpler to use and to handle more use cases. We recommend you use ‘pivot_longer()' for new code; 'gather()' isn’t going away but is no longer under active development.

Value

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

A tidySummarizedExperiment objector a tibble depending on input

See Also

[separate()] to split up by a separator.

[separate()], the complement.

[unite()], the complement, [extract()] which uses regular expression capturing groups.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
tidySummarizedExperiment::pasilla %>%
    tidy() %>%
    nest(data=-condition) %>%
    unnest(data)


tidySummarizedExperiment::pasilla %>%
    tidy() %>%
    nest(data=-condition)


tidySummarizedExperiment::pasilla %>%
    tidy() %>%
    extract(type, into="sequencing", regex="([a-z]*)_end", convert=TRUE)
# See vignette("pivot") for examples and explanation

library(dplyr)
tidySummarizedExperiment::pasilla %>%
    tidy() %>%
    pivot_longer(c(condition, type), names_to="name", values_to="value")

tidySummarizedExperiment::pasilla %>%
    tidy() %>%
    unite("group", c(condition, type))

un <- tidySummarizedExperiment::pasilla %>%
    tidy() %>%
    unite("group", c(condition, type))
un %>% separate(col=group, into=c("condition", "type"))

tidySummarizedExperiment documentation built on Nov. 8, 2020, 8:22 p.m.