Divides data into groups by a wide range of methods.
Creates a grouping factor with 1
s for group 1, 2
s for group 2, etc.
Returns a data.frame
grouped by the grouping factor for easy use in
magrittr `%>%`
pipelines.
By default*, the data points in a group are connected sequentially (e.g. c(1, 1, 2, 2, 3, 3)
)
and splitting is done from top to bottom. *Except in the "every"
method.
There are five types of grouping methods:
The "n_*"
methods split the data into a given number of groups.
They differ in how they handle excess data points.
The "greedy"
method uses a group size to split the data into groups,
greedily grabbing `n`
data points from the top.
The last group may thus differ in size (e.g. c(1, 1, 2, 2, 3)
).
The "l_*"
methods use a list of either starting points ("l_starts"
)
or group sizes ("l_sizes"
). The "l_starts"
method can also autodetect group starts
(when a value differs from the previous value).
The "every"
method puts every `n`
th data point into the same group
(e.g. c(1, 2, 3, 1, 2, 3)
).
The step methods "staircase"
and "primes"
increase the group size by a step for each group.
Note: To create groups balanced by a categorical and/or numerical variable, see the
fold()
and partition()
functions.
group(
data,
n,
method = "n_dist",
starts_col = NULL,
force_equal = FALSE,
allow_zero = FALSE,
return_factor = FALSE,
descending = FALSE,
randomize = FALSE,
col_name = ".groups",
remove_missing_starts = FALSE
)
data 

n 
Depends on Number of groups (default), group size, list of group sizes,
list of group starts, number of data points between group members,
step size or prime number to start at. See Passed as whole number(s) and/or percentage(s) ( Method 
method 
Note: examples are sizes of the generated groups
based on a vector with greedyDivides up the data greedily given a specified group size
n_dist (default)Divides the data into a specified number of groups and
distributes excess data points across groups
n_fillDivides the data into a specified number of groups and
fills up groups with excess data points from the beginning
n_lastDivides the data into a specified number of groups.
It finds the most equal group sizes possible,
using all data points. Only the last group is able to differ in size
n_randDivides the data into a specified number of groups.
Excess data points are placed randomly in groups (max. 1 per group)
l_sizesDivides up the data by a
l_startsStarts new groups at specified values in the
To skip: If passing
everyCombines every
staircaseUses step size to divide up the data.
Group size increases with 1 step for every group,
until there is no more data
primesUses prime numbers as group sizes.
Group size increases to the next prime number
until there is no more data.

starts_col 
Name of column with values to match in method 
force_equal 
Create equal groups by discarding excess data points. Implementation varies between methods. (Logical) 
allow_zero 
Whether 
return_factor 
Only return the grouping factor. (Logical) 
descending 
Change the direction of the method. (Not fully implemented) (Logical) 
randomize 
Randomize the grouping factor. (Logical) 
col_name 
Name of the added grouping factor. 
remove_missing_starts 
Recursively remove elements from the
list of starts that are not found.
For method 
data.frame
grouped by existing grouping variables and the new grouping factor.
Ludvig Renbo Olsen, rpkgs@ludvigolsen.dk
# Attach packages
library(groupdata2)
library(dplyr)
# Create data frame
df < data.frame(
"x" = c(1:12),
"species" = factor(rep(c("cat", "pig", "human"), 4)),
"age" = sample(c(1:100), 12)
)
# Using group()
df_grouped < group(df, n = 5, method = "n_dist")
# Using group() in pipeline to get mean age
df_means < df %>%
group(n = 5, method = "n_dist") %>%
dplyr::summarise(mean_age = mean(age))
# Using group() with `l_sizes`
df_grouped < group(
data = df,
n = list(0.2, 0.3),
method = "l_sizes"
)
# Using group_factor() with `l_starts`
# `c('pig', 2)` skips to the second appearance of
# 'pig' after the first appearance of 'cat'
df_grouped < group(
data = df,
n = list("cat", c("pig", 2), "human"),
method = "l_starts",
starts_col = "species"
)
