Description Details Author(s) Examples
Stack and Reshape Datasets After Splitting Concatenated Values
Package: | splitstackshape |
Type: | Package |
Version: | 1.4.8 |
Date: | 2019-04-21 |
License: | GPL-3 |
Online data collection tools like Google Forms often export multiple-response
questions with data concatenated in cells. The concat.split()
family of
functions splits such data into separate cells. The package also includes
functions to stack groups of columns and to reshape wide data, even when
the data are "unbalanced"—something which stats::reshape()
does not handle,
and which reshape2::melt()
and reshape2::dcast()
from reshape2 do not
easily handle.
Ananda Mahto
Maintainer: Ananda Mahto mrdwab@gmail.com
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ## concat.split
head(cSplit(concat.test, "Likes", drop = TRUE))
## Reshape
set.seed(1)
mydf <- data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters, 6),
varA.2 = sample(letters, 6), varA.3 = sample(letters, 6),
varB.2 = sample(10, 6), varB.3 = sample(10, 6),
varC.3 = rnorm(6))
mydf
Reshape(mydf, id.vars = c("id_1", "id_2"),
var.stubs = c("varA", "varB", "varC"))
## Stacked
Stacked(data = mydf, id.vars = c("id_1", "id_2"),
var.stubs = c("varA", "varB", "varC"),
sep = ".")
## Not run:
## Processing times
set.seed(1)
Nrow <- 1000000
Ncol <- 10
mybigdf <- cbind(id = 1:Nrow, as.data.frame(matrix(rnorm(Nrow*Ncol),
nrow=Nrow)))
head(mybigdf)
dim(mybigdf)
tail(mybigdf)
A <- names(mybigdf)
names(mybigdf) <- c("id", paste("varA", 1:3, sep = "_"),
paste("varB", 1:4, sep = "_"),
paste("varC", 1:3, sep = "_"))
system.time({
O1 <- Reshape(mybigdf, id.vars = "id",
var.stubs = c("varA", "varB", "varC"), sep = "_")
O1 <- O1[order(O1$id, O1$time), ]
})
system.time({
O2 <- merged.stack(mybigdf, id.vars="id",
var.stubs=c("varA", "varB", "varC"), sep = "_")
})
system.time({
O3 <- Stacked(mybigdf, id.vars="id",
var.stubs=c("varA", "varB", "varC"), sep = "_")
})
DT <- data.table(mybigdf)
system.time({
O4 <- merged.stack(DT, id.vars="id",
var.stubs=c("varA", "varB", "varC"), sep = "_")
})
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.