splt: Split data by a range of methods
In LudvigOlsen/R-splitters: Creating Groups from Data

View source: R/splt.R

splt	R Documentation

Split data by a range of methods

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("stable")}

Divides data into groups by a wide range of methods. Splits data by these groups.

Wraps group() with split().

Usage

splt(
  data,
  n,
  method = "n_dist",
  starts_col = NULL,
  force_equal = FALSE,
  allow_zero = FALSE,
  descending = FALSE,
  randomize = FALSE,
  remove_missing_starts = FALSE
)

Arguments

`data`	`data.frame` or `vector`. When a grouped `data.frame`, the function is applied group-wise.
`n`	Depends on `method`. Number of groups (default), group size, list of group sizes, list of group starts, number of data points between group members, step size or prime number to start at. See `method`. Passed as whole number(s) and/or percentage(s) (`0` < `n` < `1`) and/or character. Method `"l_starts"` allows `'auto'`.
`method`	`"greedy"`, `"n_dist"`, `"n_fill"`, `"n_last"`, `"n_rand"`, `"l_sizes"`, `"l_starts"`, `"every"`, `"staircase"`, or `"primes"`. Note: examples are sizes of the generated groups based on a vector with `57` elements. greedy Divides up the data greedily given a specified group size `(e.g. 10, 10, 10, 10, 10, 7)`. `n` is group size. n_dist (default) Divides the data into a specified number of groups and distributes excess data points across groups `(e.g. 11, 11, 12, 11, 12)`. `n` is number of groups. n_fill Divides the data into a specified number of groups and fills up groups with excess data points from the beginning `(e.g. 12, 12, 11, 11, 11)`. `n` is number of groups. n_last Divides the data into a specified number of groups. It finds the most equal group sizes possible, using all data points. Only the last group is able to differ in size `(e.g. 11, 11, 11, 11, 13)`. `n` is number of groups. n_rand Divides the data into a specified number of groups. Excess data points are placed randomly in groups (max. 1 per group) `(e.g. 12, 11, 11, 11, 12)`. `n` is number of groups. l_sizes Divides up the data by a `list` of group sizes. Excess data points are placed in an extra group at the end. `E.g. n = list(0.2, 0.3) outputs groups with sizes (11, 17, 29)`. `n` is a `list` of group sizes. l_starts Starts new groups at specified values in the `starts_col` vector. `n` is a `list` of starting positions. Skip values by `c(value, skip_to_number)` where `skip_to_number` is the nth appearance of the value in the vector after the previous group start. The first data point is automatically a starting position. `E.g. n = c(1, 3, 7, 25, 50) outputs groups with sizes (2, 4, 18, 25, 8)`. To skip: `given vector c("a", "e", "o", "a", "e", "o"), n = list("a", "e", c("o", 2)) outputs groups with sizes (1, 4, 1)`. If passing `n = 'auto'` the starting positions are automatically found such that a group is started whenever a value differs from the previous value (see `find_starts()`). Note that all `NA`s are first replaced by a single unique value, meaning that they will also cause group starts. See `differs_from_previous()` to set a threshold for what is considered "different". `E.g. n = "auto" for c(10, 10, 7, 8, 8, 9) would start groups at the first 10, 7, 8 and 9, and give c(1, 1, 2, 3, 3, 4).` every Combines every `n`th data point into a group. `(e.g. 12, 12, 11, 11, 11 with n = 5)`. `n` is the number of data points between group members ("every n"). staircase Uses step size to divide up the data. Group size increases with 1 step for every group, until there is no more data `(e.g. 5, 10, 15, 20, 7)`. `n` is step size. primes Uses prime numbers as group sizes. Group size increases to the next prime number until there is no more data. `(e.g. 5, 7, 11, 13, 17, 4)`. `n` is the prime number to start at.
`starts_col`	Name of column with values to match in method `"l_starts"` when `data` is a `data.frame`. Pass `'index'` to use row names. (Character)
`force_equal`	Create equal groups by discarding excess data points. Implementation varies between methods. (Logical)
`allow_zero`	Whether `n` can be passed as `0`. Can be useful when programmatically finding `n`. (Logical)
`descending`	Change the direction of the method. (Not fully implemented) (Logical)
`randomize`	Randomize the grouping factor. (Logical)
`remove_missing_starts`	Recursively remove elements from the list of starts that are not found. For method `"l_starts"` only. (Logical)

Value

list of the split `data`.

N.B. If `data` is a grouped data.frame, there's an outer list for each group. The names are based on the group indices (see dplyr::group_indices()).

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Examples

# Attach packages
library(groupdata2)
library(dplyr)

# Create data frame
df <- data.frame(
  "x" = c(1:12),
  "species" = factor(rep(c("cat", "pig", "human"), 4)),
  "age" = sample(c(1:100), 12)
)

# Using splt()
df_list <- splt(df, 5, method = "n_dist")

LudvigOlsen/R-splitters documentation built on Dec. 21, 2024, 1:19 a.m.

LudvigOlsen/R-splitters index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LudvigOlsen/R-splitters
Creating Groups from Data

splt: Split data by a range of methods
In LudvigOlsen/R-splitters: Creating Groups from Data

Split data by a range of methods

Description

Usage

Arguments

greedy

n_dist (default)

n_fill

n_last

n_rand

l_sizes

l_starts

every

staircase

primes

Value

Author(s)

See Also

Examples

Related to splt in LudvigOlsen/R-splitters...

R Package Documentation

Browse R Packages

We want your feedback!

LudvigOlsen/R-splitters Creating Groups from Data

splt: Split data by a range of methods In LudvigOlsen/R-splitters: Creating Groups from Data

Split data by a range of methods

Description

Usage

Arguments

greedy

n_dist (default)

n_fill

n_last

n_rand

l_sizes

l_starts

every

staircase

primes

Value

Author(s)

See Also

Examples

Related to splt in LudvigOlsen/R-splitters...

R Package Documentation

Browse R Packages

We want your feedback!

LudvigOlsen/R-splitters
Creating Groups from Data

splt: Split data by a range of methods
In LudvigOlsen/R-splitters: Creating Groups from Data