tolong | R Documentation |
Uses a minimal number of arguments to create a long file using stats::reshape
. Produces output even when long variable names and time values are not fully crossed.
tolong(
data,
sep = "_",
timevar = "time",
idvar = "id",
ids = 1:nrow(data),
expand = TRUE,
safe_sep = "#%@!",
reverse = F,
...
)
long(
data,
sep = "_",
timevar = "time",
idvar = "id",
ids = 1:nrow(data),
expand = TRUE,
safe_sep = "#%@!",
reverse = F,
...
)
data |
wide data frame |
sep |
(default '_') single character separator between long names and 'time' value. Variable names with this separator are transformed to long variables. |
timevar |
(default 'time') names of variable in output long file to identify occasions. Its values are taken from the suffix following the 'sep' character in each time-varying variable. |
idvar |
(default: 'id') the variable name used in the output long file to identify rows in the input wide file. It may exist in the input wide file and must, in that case, have a unique value in each row. If it does not exist, it will be created with values equal to the row numbers of the input wide file. |
ids |
(default |
expand |
(default TRUE): if 'time' values are inconsistent, fill in missing 'time's with NAs. |
safe_sep |
temporary safe? separator |
reverse |
(default FALSE) if TRUE, the 'time' value precedes the variable name |
... |
additional parameters are passed to |
tolong
is intended for the simple case in which 'wide' variables in the input data frame are identified
by the fact that they contain a separator character that separates the name of the variable in the long file from
the value of the 'time' variable that identifies the corresponding row in the long file, e.g x_1, x_2, x_3
or
brain.volume_left, brain.volume_right
. If the separator ('_' by default) occurs in other variables, it must
be temporarily substituted.
reshape
does not work if long variable names and time values are not fully crossed, e.g x_1, x_2, x_3, y_1, y_2
. By default long
creates additional variables with "NAs" so the set of variables given to reshape
is fully crossed, e.g. adding a variable y_3 <- NA
.
Compare the functionality of tolong
with that of tidyr::gather
and of
tidyr::pivot_longer
. 'tolong' depends on the format of variable names to
identify variables whose values become new variables in the long form of the data and which labels
are used as the indices of the indexing variable, whose default name is 'time', which can
be set to another value with the "timevar" argument. "tolong" can handle many 'time-varying' variables.
"gather" can only handle one. "pivot_longer" can handle many and might be considered a replacement
for "to_long" which has the disadvantage of frequently requiring the renaming of variables, an
easier task for those who have mastered the use of regular expressions, but potentially
challenging otherwise.
'long' file with each wide row repeated as many times as there are distinct values for the 'timevar' variable.
towide
for many examples using both 'towide' and 'tolong'.
z <- data.frame(id =letters[1:10], id2= 11:20, v_L = 1:10, v_R = 11:20)
tolong(z)
tolong(z, timevar = 'Side', idvar = 'idn', ids = LETTERS[1:10])
tolong(z, timevar = 'Side', idvar = 'idn', ids = z$id2)
# unbalanced times
z <- data.frame(id =letters[1:10], id2= 11:20, v_L = 1:10, v_R = 11:20, z_L = 21:30)
tolong(z)
# a separator with multiple occurrences:
z <- data.frame(id =letters[1:10], id2= 11:20, v_a_L = 1:10, v_a_R = 11:20, z_L = 21:30)
# tolong(z) would produce an error due to multiple occurrences of the default separator '_'
names(z) <- sublast('_', '__', names(z))
tolong(z, sep = '__')
# multi-character sep
z <- data.frame(id =letters[1:10], id2= 11:20, HPC_head_R = 1:10, HPC_tail_R = 11:20, HPC_head_L = 21:30, HPC_tail_L = 31:40)
names(z) <- sub("(_[LR]$)","_\\1", names(z))
names(z)
(zz <- tolong(z, sep = "__", timevar = "Side"))
zz$id3 <- rownames(zz)
tolong(zz, idvar = 'id3' ,timevar = 'Part')
dd <- data.frame( y.a = 1:3, y.b = 1:3, x.a= 1:3, time = 1:3,
x.b = 11:13, x.c = 21:23, id = c('a','a','b'))
tolong(dd, sep = '.')
dl <- tolong(dd, sep = '.', timevar = "type", idvar = 'patient')
towide(dl, idvar = 'patient', timevar = 'type')
# Long file with additional constants
dl <- data.frame(name = rep(c('A','B','C'), c(3,3,2)),
site = c('head','neck','jaw','chest')[
c(1,2,3,1,2,3,1,4)],
sex = rep(c('male','female','male'), c(3,3,2)),
var1 = 1:8,
var2 = 11:18,
invar = rep(1:3, c(3,3,2)))
towide(dl, c('name'), 'site')
#
# Two indexing variable: e.g. hippocampal volume 2 sides x 3 sites
#
dl <- data.frame(name = rep(LETTERS[1:3], each = 6),
side = rep(c('left','right'), 9),
site = rep(rep(c('head','body','tail'),each = 2),3),
volume = 1:18,
sex = rep(c('female','male','female'), each = 6),
age = rep(c(25, 43, 69), each = 6))
dl
(dlsite <- towide(dl, c('name','side'), 'site'))
(dlsite.side <- towide(dlsite, c('name'), 'side'))
#
# Flipping a data frame
#
z <- data.frame(vname =
rep(c('v1','v2','v3'), each = 4),
country = rep(c('Angola','Benin','Chad','Denmark'), 3),
code = rep(c('ANG','BEN','CHA','DEN'),3),
val__2011 = 2011 + seq(.01,.12,.01),
val__2012 = 2012 + seq(.01,.12,.01),
val__2013 = 2013 + seq(.01,.12,.01),
val__2014 = 2014 + seq(.01,.12,.01),
val__2015 = 2015 + seq(.01,.12,.01)
)
z
z %>%
tolong(sep= '__')
z %>%
tolong(sep= '__', timevar = 'year') %>%
.[!grepl('^id$',names(.))] %>%
towide(timevar = 'vname', idvar = c('code','year'))
## Not run:
# Extracting chains from a stanfit object in the 'rstan' package
# If 'mod' is a stanfit model
library(rstan)
library(spida2)
df <- as.data.frame(extract(mod, permute = F))
dl <- tolong(df, sep = ':', reverse = T)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.