long: Create a long file from a wide file

Description Usage Arguments Details Value Examples

Description

Uses a minimal number of arguments to create a long file using stats::reshape. Produces output even when long variable names and time values are not fully crossed.

Usage

1
2
3
4
5
long(data, sep = "_", timevar = "time", idvar = "id",
  ids = 1:nrow(data), expand = TRUE, safe_sep = "#%@!", ...)

tolong(data, sep = "_", timevar = "time", idvar = "id",
  ids = 1:nrow(data), expand = TRUE, safe_sep = "#%@!", ...)

Arguments

data

wide data frame

sep

(default '_') single character separator between long names and 'time' value. Variable names with this separator are transformed to long variables.

timevar

(default 'time') names of variable in output long file to identify occasions. Its values are taken from the suffix following the 'sep' character in each time-varying variable.

idvar

(default: 'id') the variable name used in the output long file to identify rows in the input wide file. It may exist in the input wide file and must, in that case, have a unique value in each row. If it does not exist, it will be created with values equal to the row numbers of the input wide file.

ids

(default 1:nrow(data)) values for idvar in long file if the variable idvar does not exist in in the input wide file. Ignored if idvar exists in data.

expand

(default TRUE): if 'time' values are inconsistent, fill in missing 'time's with NAs.

safe_sep

temporary safe? separator

...

additional parameters are passed to reshape.

Details

long is intended for the simple case in which 'wide' variables in the input data frame are identified by the fact that they contain a separator character that separates the name of the variable in the long file from the value of the 'time' variable that identifies the corresponding row in the long file, e.g x_1, x_2, x_3 or brain.volume_left, brain.volume_right. If the separator ('_' by default) occurs in other variables, it must be temporarily substituted.

rehape does not work if long variable names and time values are not fully crossed, e.g x_1, x_2, x_3, y_1, y_2. By default long creates additional variables with "NAs" so the set of variables given to reshape is fully crossed, e.g. adding a variable y_3 <- NA.

to_long is a synonym for compatibility with the 'spida' package.

Value

'long' file with each wide row repeated as many times as there are distinct values for the 'timevar' variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
z <- data.frame(id =letters[1:10], id2= 11:20, v_L = 1:10, v_R = 11:20)
long(z)
long(z, timevar = 'Side', idvar = 'idn', ids = LETTERS[1:10])
long(z, timevar = 'Side', idvar = 'idn', ids = z$id2)

# unbalanced times
z <- data.frame(id =letters[1:10], id2= 11:20, v_L = 1:10, v_R = 11:20, z_L = 21:30)
long(z)

# multi-character sep
z <- data.frame(id =letters[1:10], id2= 11:20, HPC_head_R = 1:10, HPC_tail_R = 11:20, HPC_head_L = 21:30, HPC_tail_L = 31:40)
names(z) <- sub("(_[LR]$)","_\\1", names(z))
names(z)
(zz <- long(z, sep = "__", timevar = "Side"))
zz$id3 <- rownames(zz)
long(zz, idvar = 'id3' ,timevar = 'Part')

dd <- data.frame( y.a = 1:3, y.b = 1:3, x.a= 1:3, time = 1:3,
    x.b = 11:13, x.c = 21:23, id = c('a','a','b'))
tolong(dd, sep = '.')
dl <- tolong(dd, sep = '.', timevar = "type", idvar = 'patient')
towide(dl, idvar = 'patient', timevar = 'type')

# Long file with additional constants

dl <- data.frame(name = rep(c('A','B','C'), c(3,3,2)),
                 site = c('head','neck','jaw','chest')[
                   c(1,2,3,1,2,3,1,4)],
                 sex = rep(c('male','female','male'), c(3,3,2)),
                 var1 = 1:8,
                 var2 = 11:18,
                 invar = rep(1:3, c(3,3,2)))
towide(dl, c('name'), 'site')

gmonette/yscs documentation built on May 17, 2019, 7:28 a.m.