data_to_long | R Documentation |
This function "lengthens" data, increasing the number of rows and decreasing
the number of columns. This is a dependency-free base-R equivalent of
tidyr::pivot_longer()
.
data_to_long(
data,
select = "all",
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
values_to = "value",
values_drop_na = FALSE,
rows_to = NULL,
ignore_case = FALSE,
regex = FALSE,
...,
cols
)
reshape_longer(
data,
select = "all",
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
values_to = "value",
values_drop_na = FALSE,
rows_to = NULL,
ignore_case = FALSE,
regex = FALSE,
...,
cols
)
data |
A data frame to convert to long format, so that it has more rows and fewer columns after the operation. |
select |
Variables that will be included when performing the required tasks. Can be either
If |
names_to |
The name of the new column (variable) that will contain the
names from columns in |
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep , names_pattern |
If |
values_to |
The name of the new column that will contain the values of
the columns in |
values_drop_na |
If |
rows_to |
The name of the column that will contain the row names or row
numbers from the original data. If |
ignore_case |
Logical, if |
regex |
Logical, if |
... |
Currently not used. |
cols |
Identical to |
Reshaping data into long format usually means that the input data frame is
in wide format, where multiple measurements taken on the same subject are
stored in multiple columns (variables). The long format stores the same
information in a single column, with each measurement per subject stored in
a separate row. The values of all variables that are not in select
will
be repeated.
The necessary information for data_to_long()
is:
The columns that contain the repeated measurements (select
).
The name of the newly created column that will contain the names of the
columns in select
(names_to
), to identify the source of the values.
names_to
can also be a character vector with more than one column name,
in which case names_sep
or names_pattern
must be provided to specify
which parts of the column names go into the newly created columns.
The name of the newly created column that contains the values of the
columns in select
(values_to
).
In other words: repeated measurements that are spread across several columns
will be gathered into a single column (values_to
), with the original column
names, that identify the source of the gathered values, stored in one or more
new columns (names_to
).
If a tibble was provided as input, reshape_longer()
also returns a
tibble. Otherwise, it returns a data frame.
Functions to rename stuff: data_rename()
, data_rename_rows()
, data_addprefix()
, data_addsuffix()
Functions to reorder or remove columns: data_reorder()
, data_relocate()
, data_remove()
Functions to reshape, pivot or rotate data frames: data_to_long()
, data_to_wide()
, data_rotate()
Functions to recode data: rescale()
, reverse()
, categorize()
,
recode_values()
, slide()
Functions to standardize, normalize, rank-transform: center()
, standardize()
, normalize()
, ranktransform()
, winsorize()
Split and merge data frames: data_partition()
, data_merge()
Functions to find or select columns: data_select()
, extract_column_names()
Functions to filter rows: data_match()
, data_filter()
wide_data <- setNames(
data.frame(replicate(2, rnorm(8))),
c("Time1", "Time2")
)
wide_data$ID <- 1:8
wide_data
# Default behaviour (equivalent to tidyr::pivot_longer(wide_data, cols = 1:3))
# probably doesn't make much sense to mix "time" and "id"
data_to_long(wide_data)
# Customizing the names
data_to_long(
wide_data,
select = c("Time1", "Time2"),
names_to = "Timepoint",
values_to = "Score"
)
# Reshape multiple columns into long format.
mydat <- data.frame(
age = c(20, 30, 40),
sex = c("Female", "Male", "Male"),
score_t1 = c(30, 35, 32),
score_t2 = c(33, 34, 37),
score_t3 = c(36, 35, 38),
speed_t1 = c(2, 3, 1),
speed_t2 = c(3, 4, 5),
speed_t3 = c(1, 8, 6)
)
# The column names are split into two columns: "type" and "time". The
# pattern for splitting column names is provided in `names_pattern`. Values
# of all "score_*" and "speed_*" columns are gathered into a single column
# named "count".
data_to_long(
mydat,
select = 3:8,
names_to = c("type", "time"),
names_pattern = "(score|speed)_t(\\d+)",
values_to = "count"
)
# Full example
# ------------------
data <- psych::bfi # Wide format with one row per participant's personality test
# Pivot long format
very_long_data <- data_to_long(data,
select = regex("\\d"), # Select all columns that contain a digit
names_to = "Item",
values_to = "Score",
rows_to = "Participant"
)
head(very_long_data)
even_longer_data <- data_to_long(
tidyr::who,
select = new_sp_m014:newrel_f65,
names_to = c("diagnosis", "gender", "age"),
names_pattern = "new_?(.*)_(.)(.*)",
values_to = "count"
)
head(even_longer_data)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.