df.long: Converting Data Frames Between 'Wide' and 'Long' Format

View source: R/df.long.R

df.longR Documentation

Converting Data Frames Between 'Wide' and 'Long' Format

Description

The function df.long converts a data frame from the 'wide' data format (with repeated measurements in separate columns of the same row) to the 'long' data format (with repeated measurements in separate rows), while the function df.wide converts from the 'long' data format to the 'wide' data format .

Usage

df.long(data, ..., var = NULL, var.name = "value",
        time = c("num", "chr", "fac", "ord"), time.name = "time", idvar = "idvar",
        sort = TRUE, decreasing = FALSE, na.rm = FALSE, check = TRUE)

df.wide(data, ..., var, var.name = var, time = "time", idvar = "idvar",
        sep = "", check = TRUE)

Arguments

data

a data frame in 'wide' or 'long' format.

...

an expression indicating the time-invariant variable names in data that should be kept after converting data to the 'long' or 'wide' format. Note that the operators +, -, ~, :, ::, and ! can also be used to select variables, see 'Details' in the df.subset function. Note that the ... is not specified when all variables should be kept in the converted data frame.

var

a character vector (one set of variable names) or a list of character vectors (multiple sets of variables names) in the wide data format indicating the sets of time-varying variables in the wide format that correspond to single variables in the long format when using the df.long function. Note that all variables excluded those specified in the argument ... are used when var = NULL (default), see Example 7. A character vector indicating the variable name(s) in the long format that are being split into separate variables when using the df.wide function.

var.name

a character vector specifying the variable names in the long format that correspond to the sets of time-varying variables in the wide data format when using the df.long function or a character vector specifying the prefix of the variable names in the wide format that correspond to the time-varying variables in the long format.

time

a character string indicating the data type of the newly created variable in the long format when using the df.long function, i.e., "num" for numeric consecutive integers starting from 0 (e.g., 0, 1, 2, 3 for a set of four variables in the wide data format), "chr" for a character vector, "fac" for a factor, and "ord" for a ordered factor. Note that the variable names of the set of variables in the wide data format is used when specifying "chr", "fac", or "ord" if only one set of variables is specified in the "var" argument. Otherwise numeric consecutive integers starting from 1 as character, factor or ordered factor are used. Or a character string indicating the variable name in the long data format that differentiates multiple records from the same group or individual when using the df.wide function.

time.name

a character string indicating the name of the newly created variable in the long format when using the df.long function. By default, the variable is named "time". Note that variable names can also be specified using the var when multiple sets of time-varying variables are specified in a list, e.g., var = list(dep = c("ad", "bd"), anx = c("aa", "ba")) (see alternative specification in Example 5).

idvar

a character string indicating the name of the identification variable in the wide data format that is used to sort the data after converting a data frame from wide to long format when using the df.long function and specifying sort = TRUE. Note that the function will create an identification variable with consecutive integer starting from 1 if the variable specified in idvar is not found in data. Or a character string indicating the name of the identification variable in the long data format when using the df.wide function.

sort

logical: if TRUE (default), data frame in the long format is sorted according to the identification variable specified in idvar when using the df.long function.

decreasing

logical: if TRUE, the sort is decreasing when specifying sort = TRUE.

na.rm

logical: if TRUE, rows with NA values for all variables in the long format that correspond to the sets of time-varying variables in the wide data format will be removed from the data when using the df.long function.

check

logical: if TRUE (default), argument specification is checked.

sep

a character string indicating a separating character in the variable names after converting data from the long format to the wide format when using the df.wide function. For example, the variable value in the long format will be split into the variables value0, value1, and value2 when specifying sep = "" (default), but will be split into the variables value_0, value_1, and value_2 when specifying sep = "_".

Value

Data frame that is converted to the 'long' or 'wide' format.

Note

The function df.long uses the function melt and the function df.long uses the function dcast provided in the R package data.table by Tyson Barrett et al., (2025).

Author(s)

Takuya Yanagida

References

Barrett, T., Dowle, M., Srinivasan, A., Gorecki, J., Chirico, M., Hocking, T., & Schwendinger, B. (2025). data.table: Extension of 'data.frame'. R package version 1.17.8. https://CRAN.R-project.org/package=data.table

See Also

df.check, df.duplicated, df.unique, df.head, df.tail, df.merge, df.move, df.rbind, df.rename, df.sort, df.subset

Examples

dat.w <- data.frame(id = c(23, 55, 71),
                    gend = c("male", "female", "male"), age = c(22, 19, 26),
                    adep = c(3, 6, NA), bdep = c(5, 5, 6), cdep = c(4, NA, 5),
                    aanx = c(5, 3, 6), banx = c(NA, 7, 2), canx = c(6, NA, 8))

#----------------------------------------------------------------------------
# Convert from 'wide' data format to the 'long' data format

# Example 1: One set of time-varying variables combined into "dep"
df.long(dat.w, var = c("adep", "bdep", "cdep"), var.name = "dep", idvar = "id")

# Example 2: Select time-invariant variables 'gend' and 'age'
df.long(dat.w, gend, age, var = c("adep", "bdep", "cdep"), var.name = "dep",
        idvar = "id")

# Example 3: Newly created variable "type" as character vector
df.long(dat.w, age, var = c("adep", "bdep", "cdep"), var.name = "dep",
        idvar = "id", time = "chr", time.name = "type")

# Example 4: User-defined variable "type"
df.long(dat.w, age, var = c("adep", "bdep", "cdep"), var.name = "dep",
        idvar = "id", time = c("pre", "post", "follow-up"), time.name = "type")

# Example 5: Two sets of time-varying variables combined into "dep" and "anx"
df.long(dat.w, age,
        var = list(c("adep", "bdep", "cdep"), c("aanx", "banx", "canx")),
        var.name = c("dep", "anx"), idvar = "id")

# Alternative specification using named lists for the argument 'var'
df.long(dat.w, age,
        var = list(dep = c("adep", "bdep", "cdep"), anx = c("aanx", "banx", "canx")),
        idvar = "id")

# Example 6: Remove rows with only NA values
df.long(dat.w, age, var = list(c("adep", "bdep", "cdep"), c("aanx", "banx", "canx")),
        idvar = "id", sort = FALSE, na.rm = TRUE)

# Example 7: Convert all variables except "age" and "gend"
df.long(dat.w, age, gend, idvar = "id")

#----------------------------------------------------------------------------
# Convert from 'long' data format to the 'wide' data format

dat.l <- df.long(dat.w,
                 var = list(c("adep", "bdep", "cdep"), c("aanx", "banx", "canx")),
                 var.name = c("dep", "anx"), idvar = "id")

# Example 8: Time-varying variables "dep" and "anx" expanded into multiple variables
df.wide(dat.l, var = c("dep", "anx"), idvar = "id", time = "time")

# Example 9: Select time-invariant variables 'age'
df.wide(dat.l, age, var = c("dep", "anx"), idvar = "id", time = "time")

# Example 10: Variable name prefix of the  expanded variables "depre" and "anxie"
#             with separating character "."
df.wide(dat.l, var = c("dep", "anx"), var.name = c("depre", "anxie"),
        idvar = "id", time = "time", sep = ".")

misty documentation built on Aug. 18, 2025, 5:16 p.m.

Related to df.long in misty...