group: Create groups from your data
In LudvigOlsen/groupdata2: Creating Groups from Data

group

R Documentation

Create groups from your data

Description

\Sexpr[results=rd, stage=render]{lifecycle::badge("stable")}

Divides data into groups by a wide range of methods. Creates a grouping factor with 1s for group 1, 2s for group 2, etc. Returns a data.frame grouped by the grouping factor for easy use in magrittr `%>%` pipelines.

By default*, the data points in a group are connected sequentially (e.g. c(1, 1, 2, 2, 3, 3)) and splitting is done from top to bottom. *Except in the "every" method.

There are five types of grouping methods:

The "n_*" methods split the data into a given number of groups. They differ in how they handle excess data points.

The "greedy" method uses a group size to split the data into groups, greedily grabbing `n` data points from the top. The last group may thus differ in size (e.g. c(1, 1, 2, 2, 3)).

The "l_*" methods use a list of either starting points ("l_starts") or group sizes ("l_sizes"). The "l_starts" method can also auto-detect group starts (when a value differs from the previous value).

The "every" method puts every `n`th data point into the same group (e.g. c(1, 2, 3, 1, 2, 3)).

The step methods "staircase" and "primes" increase the group size by a step for each group.

Note: To create groups balanced by a categorical and/or numerical variable, see the fold() and partition() functions.

Usage

group(
  data,
  n,
  method = "n_dist",
  starts_col = NULL,
  force_equal = FALSE,
  allow_zero = FALSE,
  return_factor = FALSE,
  descending = FALSE,
  randomize = FALSE,
  col_name = ".groups",
  remove_missing_starts = FALSE
)

Arguments

`data`	`data.frame` or `vector`. When a grouped `data.frame`, the function is applied group-wise.
`n`	Depends on `method`. Number of groups (default), group size, list of group sizes, list of group starts, number of data points between group members, step size or prime number to start at. See `method`. Passed as whole number(s) and/or percentage(s) (`0` < `n` < `1`) and/or character. Method `"l_starts"` allows `'auto'`.
`method`	`"greedy"`, `"n_dist"`, `"n_fill"`, `"n_last"`, `"n_rand"`, `"l_sizes"`, `"l_starts"`, `"every"`, `"staircase"`, or `"primes"`. Note: examples are sizes of the generated groups based on a vector with `57` elements. greedy Divides up the data greedily given a specified group size `(e.g. 10, 10, 10, 10, 10, 7)`. `n` is group size. n_dist (default) Divides the data into a specified number of groups and distributes excess data points across groups `(e.g. 11, 11, 12, 11, 12)`. `n` is number of groups. n_fill Divides the data into a specified number of groups and fills up groups with excess data points from the beginning `(e.g. 12, 12, 11, 11, 11)`. `n` is number of groups. n_last Divides the data into a specified number of groups. It finds the most equal group sizes possible, using all data points. Only the last group is able to differ in size `(e.g. 11, 11, 11, 11, 13)`. `n` is number of groups. n_rand Divides the data into a specified number of groups. Excess data points are placed randomly in groups (max. 1 per group) `(e.g. 12, 11, 11, 11, 12)`. `n` is number of groups. l_sizes Divides up the data by a `list` of group sizes. Excess data points are placed in an extra group at the end. `E.g. n = list(0.2, 0.3) outputs groups with sizes (11, 17, 29)`. `n` is a `list` of group sizes. l_starts Starts new groups at specified values in the `starts_col` vector. `n` is a `list` of starting positions. Skip values by `c(value, skip_to_number)` where `skip_to_number` is the nth appearance of the value in the vector after the previous group start. The first data point is automatically a starting position. `E.g. n = c(1, 3, 7, 25, 50) outputs groups with sizes (2, 4, 18, 25, 8)`. To skip: `given vector c("a", "e", "o", "a", "e", "o"), n = list("a", "e", c("o", 2)) outputs groups with sizes (1, 4, 1)`. If passing `n = 'auto'` the starting positions are automatically found such that a group is started whenever a value differs from the previous value (see `find_starts()`). Note that all `NA`s are first replaced by a single unique value, meaning that they will also cause group starts. See `differs_from_previous()` to set a threshold for what is considered "different". `E.g. n = "auto" for c(10, 10, 7, 8, 8, 9) would start groups at the first 10, 7, 8 and 9, and give c(1, 1, 2, 3, 3, 4).` every Combines every `n`th data point into a group. `(e.g. 12, 12, 11, 11, 11 with n = 5)`. `n` is the number of data points between group members ("every n"). staircase Uses step size to divide up the data. Group size increases with 1 step for every group, until there is no more data `(e.g. 5, 10, 15, 20, 7)`. `n` is step size. primes Uses prime numbers as group sizes. Group size increases to the next prime number until there is no more data. `(e.g. 5, 7, 11, 13, 17, 4)`. `n` is the prime number to start at.
`starts_col`	Name of column with values to match in method `"l_starts"` when `data` is a `data.frame`. Pass `'index'` to use row names. (Character)
`force_equal`	Create equal groups by discarding excess data points. Implementation varies between methods. (Logical)
`allow_zero`	Whether `n` can be passed as `0`. Can be useful when programmatically finding `n`. (Logical)
`return_factor`	Only return the grouping factor. (Logical)
`descending`	Change the direction of the method. (Not fully implemented) (Logical)
`randomize`	Randomize the grouping factor. (Logical)
`col_name`	Name of the added grouping factor.
`remove_missing_starts`	Recursively remove elements from the list of starts that are not found. For method `"l_starts"` only. (Logical)

Value

data.frame grouped by existing grouping variables and the new grouping factor.

Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Examples

# Attach packages
library(groupdata2)
library(dplyr)

# Create data frame
df <- data.frame(
  "x" = c(1:12),
  "species" = factor(rep(c("cat", "pig", "human"), 4)),
  "age" = sample(c(1:100), 12)
)

# Using group()
df_grouped <- group(df, n = 5, method = "n_dist")

# Using group() in pipeline to get mean age
df_means <- df %>%
  group(n = 5, method = "n_dist") %>%
  dplyr::summarise(mean_age = mean(age))

# Using group() with `l_sizes`
df_grouped <- group(
  data = df,
  n = list(0.2, 0.3),
  method = "l_sizes"
)

# Using group_factor() with `l_starts`
# `c('pig', 2)` skips to the second appearance of
# 'pig' after the first appearance of 'cat'
df_grouped <- group(
  data = df,
  n = list("cat", c("pig", 2), "human"),
  method = "l_starts",
  starts_col = "species"
)

LudvigOlsen/groupdata2 documentation built on Dec. 20, 2024, 7:12 p.m.

LudvigOlsen/groupdata2 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LudvigOlsen/groupdata2
Creating Groups from Data

group: Create groups from your data
In LudvigOlsen/groupdata2: Creating Groups from Data

Create groups from your data

Description

Usage

Arguments

greedy

n_dist (default)

n_fill

n_last

n_rand

l_sizes

l_starts

every

staircase

primes

Value

Author(s)

See Also

Examples

Related to group in LudvigOlsen/groupdata2...

R Package Documentation

Browse R Packages

We want your feedback!

LudvigOlsen/groupdata2 Creating Groups from Data

group: Create groups from your data In LudvigOlsen/groupdata2: Creating Groups from Data

Create groups from your data

Description

Usage

Arguments

greedy

n_dist (default)

n_fill

n_last

n_rand

l_sizes

l_starts

every

staircase

primes

Value

Author(s)

See Also

Examples

Related to group in LudvigOlsen/groupdata2...

R Package Documentation

Browse R Packages

We want your feedback!

LudvigOlsen/groupdata2
Creating Groups from Data

group: Create groups from your data
In LudvigOlsen/groupdata2: Creating Groups from Data