tolong | R Documentation |
Uses a minimal number of arguments to create a long file
using stats::reshape
. Produces output even
when long variable names and time values are not fully
crossed.
tolong(
data,
sep = "_",
timevar = "time",
idvar = "id",
ids = 1:nrow(data),
valuepattern = if (numericalpattern) "[0-9]+" else ".*",
numericalpattern = FALSE,
expand = TRUE,
safe_sep = "@3-2861-2579@",
reverse = F,
...
)
tolong_old(
data,
sep = "_",
timevar = "time",
idvar = "id",
ids = 1:nrow(data),
expand = TRUE,
safe_sep = "#%@!",
reverse = F,
...
)
long(
data,
sep = "_",
timevar = "time",
idvar = "id",
ids = 1:nrow(data),
valuepattern = if (numericalpattern) "[0-9]+" else ".*",
numericalpattern = FALSE,
expand = TRUE,
safe_sep = "@3-2861-2579@",
reverse = F,
...
)
data |
wide data frame |
sep |
(default '_') single or multiple character separator between long names and 'time' value. Variable names with this separator are transformed to long variables. If the string occurs multiple times in a variable name, only the last occurrence is treated as a separator. |
timevar |
(default 'time') names of variable in output long file to identify occasions. Its values are taken from the suffix following the 'sep' character in each time-varying variable. |
idvar |
(default: 'id') the variable name used in the output long file to identify the provenance row in the input wide file. It may exist in the input wide file and must, in that case, have a unique value in each row. If it does not exist, it will be created with values equal to the row numbers of the input wide file. |
ids |
(default |
valuepattern |
a regular expression to match the form of time values at the end of variable names immediately following the separator. Specifying this pattern can avoid misinterpreting separators in variable names that are not intended to turned into long variables. |
numericalpattern |
(default FALSE) if TRUE,
|
expand |
(default TRUE): if 'time' values are inconsistent, fill in missing 'time's with NAs. |
safe_sep |
temporary safe? separator |
reverse |
(default FALSE) if TRUE, the 'time' value precedes the variable name |
... |
additional parameters are passed to
|
tolong
is intended for the simple case in which
'wide' variables in the input data frame are separator string that
separates the name of the variable in the long file from
the value of the 'time' variable that identifies the
corresponding row in the long file, e.g x_1, x_2,
x_3
or brain.volume_left, brain.volume_right
. Since
the separator ('_' by default) may occur in other variables,
tolong
offers two mechanisms to avoid misinterpreting
those occurrences as separators. If there are multiple
occurrences of the separator string in a variable name, only the
last occurrence is interpreted as a separator. Secondly,
the valuepattern
parameter can specify a regular expression
to identify allowed 'time' value strings at the end of variable
names. The common case where the 'time' values are numerical
can be specified with the numericalpattern
parameter.
reshape
does not work if long variable names
and time values are not fully crossed, e.g x_1, x_2,
x_3, y_1, y_2
. By default long
creates additional
variables with "NAs" so the set of variables given to
reshape
is fully crossed, e.g. adding a
variable y_3 <- NA
.
Compare the functionality of tolong
with that of
tidyr::gather
and of
tidyr::pivot_longer
. 'tolong' depends on the
format of variable names to identify variables whose values
become new variables in the long form of the data and which
labels are used as the indices of the indexing variable,
whose default name is 'time', which can be set to another
value with the "timevar" argument. "tolong" can handle many
'time-varying' variables. "gather" can only handle one.
"pivot_longer" can handle many and might be considered a
replacement for "to_long" which has the disadvantage of
frequently requiring the renaming of variables, an easier
task for those who have mastered the use of regular
expressions, but potentially challenging otherwise.
'long' file with each wide row repeated as many
times as there are distinct values for the 'timevar'
variable. The rownames show the provenance of each row
by combining the value of id
with the value of time
separated by a period
towide
for many examples using both
'towide' and 'tolong'.
z <- data.frame(id =letters[1:10], id2= 11:20, v_L = 1:10, v_R = 11:20)
z
tolong(z)
tolong(z, timevar = 'Side', idvar = 'idn', ids = LETTERS[1:10])
tolong(z, timevar = 'Side', idvar = 'idn', ids = z$id2)
# unbalanced times
z <- data.frame(id =letters[1:10], id2= 11:20, v_L = 1:10, v_R = 11:20, z_L = 21:30)
z
tolong(z)
# a separator with multiple occurrences:
z <- data.frame(id =letters[1:10], id2= 11:20, v_a_L = 1:10, v_a_R = 11:20, z_L = 21:30)
z
# The previous version of tolong() would have produced an error
# due to multiple occurrences of the default separator '_'
# but the new version matches only the last occurrence in
# each variable name. The sublast() function helps by
# matching only the last occurrence to facilitate
# replacing it with a new unique separator, but it is
# no longer necessary to do this:
zz <- z
names(zz) <- sublast('_', '__', names(zz))
tolong(zz, sep = '__')
# or, now,, with the same result:
tolong(z)
#
# - sep can use more than one character
# - the character string is interpreted literally,
# i.e. if special regular expression characters
# they are interpreted literally.
z <- data.frame(id =letters[1:10], id2= 11:20, HPC_head_R = 1:10, HPC_tail_R = 11:20, HPC_head_L = 21:30, HPC_tail_L = 31:40)
z
names(z) <- sub("(_[LR]$)","_\\1", names(z))
names(z)
(zz <- tolong(z, sep = "__", timevar = "Side"))
zz$id3 <- rownames(zz)
tolong(zz, idvar = 'id3' ,timevar = 'Part')
dd <- data.frame( y.a = 1:3, y.b = 1:3, x.a= 1:3, time = 1:3,
x.b = 11:13, x.c = 21:23, id = c('a','a','b'))
dd
tolong(dd, sep = '.')
dl <- tolong(dd, sep = '.', timevar = "type", idvar = 'patient')
dl
towide(dl, idvar = 'patient', timevar = 'type')
# Long file with additional constants
dl <- data.frame(name = rep(c('A','B','C'), c(3,3,2)),
site = c('head','neck','jaw','chest')[
c(1,2,3,1,2,3,1,4)],
sex = rep(c('male','female','male'), c(3,3,2)),
var1 = 1:8,
var2 = 11:18,
invar = rep(1:3, c(3,3,2)))
towide(dl, c('name'), 'site')
#
# Two indexing variable: e.g. hippocampal volume 2 sides x 3 sites
#
dl <- data.frame(name = rep(LETTERS[1:3], each = 6),
side = rep(c('left','right'), 9),
site = rep(rep(c('head','body','tail'),each = 2),3),
volume = 1:18,
sex = rep(c('female','male','female'), each = 6),
age = rep(c(25, 43, 69), each = 6))
dl
(dlsite <- towide(dl, c('name','side'), 'site'))
(dlsite.side <- towide(dlsite, c('name'), 'side'))
#
# Flipping a data frame
#
z <- data.frame(vname =
rep(c('v1','v2','v3'), each = 4),
country = rep(c('Angola','Benin','Chad','Denmark'), 3),
code = rep(c('ANG','BEN','CHA','DEN'),3),
val__2011 = 2011 + seq(.01,.12,.01),
val__2012 = 2012 + seq(.01,.12,.01),
val__2013 = 2013 + seq(.01,.12,.01),
val__2014 = 2014 + seq(.01,.12,.01),
val__2015 = 2015 + seq(.01,.12,.01)
)
z
z %>%
tolong(sep= '__')
z %>%
tolong(sep= '__', timevar = 'year') %>%
.[!grepl('^id$',names(.))] %>%
towide(timevar = 'vname', idvar = c('code','year'))
dd <- data.frame(var__tag_1 = 1:10, var__tag_2 = 1:10)
dd <- data.frame(var__tag_1 = 1:10, var__tag_2 = 1:10)
dd <- data.frame(var.1 = 1:10, var.2 = 1:10)
tolong(dd, sep = '__')
tolong(dd, sep = '.')
## Not run:
# Extracting chains from a stanfit object in the 'rstan' package
# If 'mod' is a stanfit model
library(rstan)
library(spida2)
df <- as.data.frame(extract(mod, permute = F))
dl <- tolong(df, sep = ':', reverse = T)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.