split | R Documentation |
Split method for data.table. Faster and more flexible. Be aware that processing list of data.tables will be generally much slower than manipulation in single data.table by group using by
argument, read more on data.table
.
## S3 method for class 'data.table'
split(x, f, drop = FALSE,
by, sorted = FALSE, keep.by = TRUE, flatten = TRUE,
..., verbose = getOption("datatable.verbose"))
x |
data.table |
f |
Same as |
drop |
logical. Default |
by |
character vector. Column names on which split should be made. For |
sorted |
When default |
keep.by |
logical default |
flatten |
logical default |
... |
When using |
verbose |
logical default |
Argument f
is just for consistency in usage to data.frame method. Recommended is to use by
argument instead, it will be faster, more flexible, and by default will preserve order according to order in data.
List of data.table
s. If using flatten
FALSE and length(by) > 1L
then recursively nested lists having data.table
s as leafs of grouping according to by
argument.
data.table
, rbindlist
set.seed(123)
DT = data.table(x1 = rep(letters[1:2], 6),
x2 = rep(letters[3:5], 4),
x3 = rep(letters[5:8], 3),
y = rnorm(12))
DT = DT[sample(.N)]
DF = as.data.frame(DT)
# split consistency with data.frame: `x, f, drop`
all.equal(
split(DT, list(DT$x1, DT$x2)),
lapply(split(DF, list(DF$x1, DF$x2)), setDT)
)
# nested list using `flatten` arguments
split(DT, by=c("x1", "x2"))
split(DT, by=c("x1", "x2"), flatten=FALSE)
# dealing with factors
fdt = DT[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3]
fdf = as.data.frame(fdt)
sdf = split(fdf, list(fdf$x1, fdf$x2))
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
# factors having unused levels, drop FALSE, TRUE
fdt = DT[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L],
x2 = as.factor(c("a", as.character(x2)))[-1L],
x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
y = y)]
fdf = as.data.frame(fdt)
sdf = split(fdf, list(fdf$x1, fdf$x2))
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
sdf = split(fdf, list(fdf$x1, fdf$x2), drop=TRUE)
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE, drop=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.