group_factor | R Documentation |
Divides data into groups by a wide range of methods.
Creates and returns a grouping factor
with 1
s for group 1, 2
s for group 2, etc.
By default*, the data points in a group are connected sequentially (e.g. c(1, 1, 2, 2, 3, 3)
)
and splitting is done from top to bottom. *Except in the "every"
method.
There are five types of grouping methods:
The "n_*"
methods split the data into a given number of groups.
They differ in how they handle excess data points.
The "greedy"
method uses a group size to split the data into groups,
greedily grabbing `n`
data points from the top.
The last group may thus differ in size (e.g. c(1, 1, 2, 2, 3)
).
The "l_*"
methods use a list of either starting points ("l_starts"
)
or group sizes ("l_sizes"
). The "l_starts"
method can also auto-detect group starts
(when a value differs from the previous value).
The "every"
method puts every `n`
th data point into the same group
(e.g. c(1, 2, 3, 1, 2, 3)
).
The step methods "staircase"
and "primes"
increase the group size by a step for each group.
Note: To create groups balanced by a categorical and/or numerical variable, see the
fold()
and partition()
functions.
group_factor(
data,
n,
method = "n_dist",
starts_col = NULL,
force_equal = FALSE,
allow_zero = FALSE,
descending = FALSE,
randomize = FALSE,
remove_missing_starts = FALSE
)
data |
|
n |
Depends on Number of groups (default), group size, list of group sizes,
list of group starts, number of data points between group members,
step size or prime number to start at. See Passed as whole number(s) and/or percentage(s) ( Method |
method |
Note: examples are sizes of the generated groups
based on a vector with greedyDivides up the data greedily given a specified group size
n_dist (default)Divides the data into a specified number of groups and
distributes excess data points across groups
n_fillDivides the data into a specified number of groups and
fills up groups with excess data points from the beginning
n_lastDivides the data into a specified number of groups.
It finds the most equal group sizes possible,
using all data points. Only the last group is able to differ in size
n_randDivides the data into a specified number of groups.
Excess data points are placed randomly in groups (max. 1 per group)
l_sizesDivides up the data by a
l_startsStarts new groups at specified values in the
To skip: If passing
everyCombines every
staircaseUses step size to divide up the data.
Group size increases with 1 step for every group,
until there is no more data
primesUses prime numbers as group sizes.
Group size increases to the next prime number
until there is no more data.
|
starts_col |
Name of column with values to match in method |
force_equal |
Create equal groups by discarding excess data points. Implementation varies between methods. (Logical) |
allow_zero |
Whether |
descending |
Change the direction of the method. (Not fully implemented) (Logical) |
randomize |
Randomize the grouping factor. (Logical) |
remove_missing_starts |
Recursively remove elements from the
list of starts that are not found.
For method |
Grouping factor with 1
s for group 1, 2
s for group 2, etc.
N.B. If `data`
is a grouped data.frame
,
the output is a data.frame
with the existing groupings
and the generated grouping factor. The row order from `data`
is maintained.
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
Other grouping functions:
all_groups_identical()
,
collapse_groups_by
,
collapse_groups()
,
fold()
,
group()
,
partition()
,
splt()
Other staircase tools:
%primes%()
,
%staircase%()
,
group()
Other l_starts tools:
differs_from_previous()
,
find_missing_starts()
,
find_starts()
,
group()
# Attach packages
library(groupdata2)
library(dplyr)
# Create a data frame
df <- data.frame(
"x" = c(1:12),
"species" = factor(rep(c("cat", "pig", "human"), 4)),
"age" = sample(c(1:100), 12)
)
# Using group_factor() with n_dist
groups <- group_factor(df, 5, method = "n_dist")
df$groups <- groups
# Using group_factor() with greedy
groups <- group_factor(df, 5, method = "greedy")
df$groups <- groups
# Using group_factor() with l_sizes
groups <- group_factor(df, list(0.2, 0.3), method = "l_sizes")
df$groups <- groups
# Using group_factor() with l_starts
groups <- group_factor(df, list("cat", c("pig", 2), "human"),
method = "l_starts", starts_col = "species"
)
df$groups <- groups
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.