split: Divide into Groups and Reassemble

Description

`split` divides the data in the vector `x` into the groups defined by `f`. The replacement forms replace values corresponding to such a division. `unsplit` reverses the effect of `split`.

Usage

 ```1 2 3 4 5 6``` ```split(x, f, drop = FALSE, ...) ## Default S3 method: split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...) split(x, f, drop = FALSE, ...) <- value unsplit(value, f, drop = FALSE) ```

Arguments

 `x` vector or data frame containing values to be divided into groups. `f` a â€˜factorâ€™ in the sense that `as.factor(f)` defines the grouping, or a list of such factors in which case their interaction is used for the grouping. `drop` logical indicating if levels that do not occur should be dropped (if `f` is a `factor` or a list). `value` a list of vectors or data frames compatible with a splitting of `x`. Recycling applies if the lengths do not match. `sep` character string, passed to `interaction` in the case where `f` is a `list`. `lex.order` logical, passed to `interaction` when `f` is a list. `...` further potential arguments passed to methods.

Details

`split` and `split<-` are generic functions with default and `data.frame` methods. The data frame method can also be used to split a matrix into a list of matrices, and the replacement form likewise, provided they are invoked explicitly.

`unsplit` works with lists of vectors or data frames (assumed to have compatible structure, as if created by `split`). It puts elements or rows back in the positions given by `f`. In the data frame case, row names are obtained by unsplitting the row name vectors from the elements of `value`.

`f` is recycled as necessary and if the length of `x` is not a multiple of the length of `f` a warning is printed.

Any missing values in `f` are dropped together with the corresponding values of `x`.

The default method calls `interaction` when `f` is a `list`. If the levels of the factors contain . the factors may not be split as expected, unless `sep` is set to string not present in the factor `levels`.

Value

The value returned from `split` is a list of vectors containing the values for the groups. The components of the list are named by the levels of `f` (after converting to a factor, or if already a factor and `drop = TRUE`, dropping unused levels).

The replacement forms return their right hand side. `unsplit` returns a vector or data frame for which `split(x, f)` equals `value`

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

`cut` to categorize numeric values.
`strsplit` to split strings.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39``` ```require(stats); require(graphics) n <- 10; nn <- 100 g <- factor(round(n * runif(n * nn))) x <- rnorm(n * nn) + sqrt(as.numeric(g)) xg <- split(x, g) boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE) sapply(xg, length) sapply(xg, mean) ### Calculate 'z-scores' by group (standardize to mean zero, variance one) z <- unsplit(lapply(split(x, g), scale), g) # or zz <- x split(zz, g) <- lapply(split(x, g), scale) # and check that the within-group std dev is indeed one tapply(z, g, sd) tapply(zz, g, sd) ### data frame variation ## Notice that assignment form is not used since a variable is being added g <- airquality\$Month l <- split(airquality, g) l <- lapply(l, transform, Oz.Z = scale(Ozone)) aq2 <- unsplit(l, g) head(aq2) with(aq2, tapply(Oz.Z, Month, sd, na.rm = TRUE)) ### Split a matrix into a list by columns ma <- cbind(x = 1:10, y = (-4:5)^2) split(ma, col(ma)) split(1:10, 1:2) ```