rsplit: Fast (Recursive) Splitting

View source: R/rsplit.R

rsplitR Documentation

Fast (Recursive) Splitting

Description

rsplit (recursively) splits a vector, matrix or data frame into subsets according to combinations of (multiple) vectors / factors and returns a (nested) list. If flatten = TRUE, the list is flattened yielding the same result as split. rsplit is implemented as a wrapper around gsplit, and significantly faster than split.

Usage

rsplit(x, ...)

## Default S3 method:
rsplit(x, fl, drop = TRUE, flatten = FALSE, use.names = TRUE, ...)

## S3 method for class 'matrix'
rsplit(x, fl, drop = TRUE, flatten = FALSE, use.names = TRUE,
       drop.dim = FALSE, ...)

## S3 method for class 'data.frame'
rsplit(x, by, drop = TRUE, flatten = FALSE, cols = NULL,
       keep.by = FALSE, simplify = TRUE, use.names = TRUE, ...)

Arguments

x

a vector, matrix, data.frame or list like object.

fl

a GRP object, or a (list of) vector(s) / factor(s) (internally converted to a GRP object(s)) used to split x.

by

data.frame method: Same as fl, but also allows one- or two-sided formulas i.e. ~ group1 or var1 + var2 ~ group1 + group2. See Examples.

drop

logical. TRUE removes unused levels or combinations of levels from factors before splitting; FALSE retains those combinations yielding empty list elements in the output.

flatten

logical. If fl is a list of vectors / factors, TRUE calls GRP on the list, creating a single grouping used for splitting; FALSE yields recursive splitting.

use.names

logical. TRUE returns a named list (like split); FALSE returns a plain list.

drop.dim

logical. TRUE returns atomic vectors for matrix-splits consisting of one row.

cols

data.frame method: Select columns to split using a function, column names, indices or a logical vector. Note: cols is ignored if a two-sided formula is passed to by.

keep.by

logical. If a formula is passed to by, then TRUE preserves the splitting (right-hand-side) variables in the data frame.

simplify

data.frame method: Logical. TRUE calls rsplit.default if a single column is split e.g. rsplit(data, col1 ~ group1) becomes the same as rsplit(data$col1, data$group1).

...

further arguments passed to GRP. Sensible choices would be sort = FALSE, decreasing = TRUE or na.last = FALSE. Note that these options only apply if fl is not already a (list of) factor(s).

Value

a (nested) list containing the subsets of x.

See Also

gsplit, rapply2d, unlist2d, List Processing, Collapse Overview

Examples

rsplit(mtcars$mpg, mtcars$cyl)
rsplit(mtcars, mtcars$cyl)

rsplit(mtcars, mtcars[.c(cyl, vs, am)])
rsplit(mtcars, ~ cyl + vs + am, keep.by = TRUE)  # Same thing
rsplit(mtcars, ~ cyl + vs + am)

rsplit(mtcars, ~ cyl + vs + am, flatten = TRUE)

rsplit(mtcars, mpg ~ cyl)
rsplit(mtcars, mpg ~ cyl, simplify = FALSE)
rsplit(mtcars, mpg + hp ~ cyl + vs + am)
rsplit(mtcars, mpg + hp ~ cyl + vs + am, keep.by = TRUE)

# Split this sectoral data, first by Variable (Emloyment and Value Added), then by Country
GGDCspl <- rsplit(GGDC10S, ~ Variable + Country, cols = 6:16)
str(GGDCspl)

# The nested list can be reassembled using unlist2d()
head(unlist2d(GGDCspl, idcols = .c(Variable, Country)))
rm(GGDCspl)

# Another example with mtcars (not as clean because of row.names)
nl <- rsplit(mtcars, mpg + hp ~ cyl + vs + am)
str(nl)
unlist2d(nl, idcols = .c(cyl, vs, am), row.names = "car")
rm(nl)

collapse documentation built on Nov. 3, 2024, 9:08 a.m.