upsample | R Documentation |
Uses random upsampling to fix the group sizes to the largest group in the data frame.
Wraps balance()
.
upsample(
data,
cat_col,
id_col = NULL,
id_method = "n_ids",
mark_new_rows = FALSE,
new_rows_col_name = ".new_row"
)
data |
|
cat_col |
Name of categorical variable to balance by. (Character) |
id_col |
Name of factor with IDs. (Character) IDs are considered entities, e.g. allowing us to add or remove all rows for an ID.
How this is used is up to the E.g. If we have measured a participant multiple times and want make sure that we keep all these measurements. Then we would either remove/add all measurements for the participant or leave in all measurements for the participant. N.B. When |
id_method |
Method for balancing the IDs. (Character)
n_ids (default)Balances on ID level only. It makes sure there are the same number of IDs for each category. This might lead to a different number of rows between categories. n_rows_cAttempts to level the number of rows per category, while only removing/adding entire IDs. This is done in 2 steps:
distributedDistributes the lacking/excess rows equally between the IDs. If the number to distribute can not be equally divided, some IDs will have 1 row more/less than the others. nestedCalls I.e. if size is |
mark_new_rows |
Add column with |
new_rows_col_name |
Name of column marking new rows. Defaults to |
`id_col`
Upsampling is done with replacement for added rows, while the original data remains intact.
`id_col`
See `id_method`
description.
data.frame
with added rows. Ordered by potential grouping variables, `cat_col`
and (potentially) `id_col`
.
Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk
Other sampling functions:
balance()
,
downsample()
# Attach packages
library(groupdata2)
# Create data frame
df <- data.frame(
"participant" = factor(c(1, 1, 2, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5)),
"diagnosis" = factor(c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0)),
"trial" = c(1, 2, 1, 1, 2, 3, 4, 1, 2, 1, 2, 3, 4),
"score" = sample(c(1:100), 13)
)
# Using upsample()
upsample(df, cat_col = "diagnosis")
# Using upsample() with id_method "n_ids"
# With column specifying added rows
upsample(df,
cat_col = "diagnosis",
id_col = "participant",
id_method = "n_ids",
mark_new_rows = TRUE
)
# Using upsample() with id_method "n_rows_c"
# With column specifying added rows
upsample(df,
cat_col = "diagnosis",
id_col = "participant",
id_method = "n_rows_c",
mark_new_rows = TRUE
)
# Using upsample() with id_method "distributed"
# With column specifying added rows
upsample(df,
cat_col = "diagnosis",
id_col = "participant",
id_method = "distributed",
mark_new_rows = TRUE
)
# Using upsample() with id_method "nested"
# With column specifying added rows
upsample(df,
cat_col = "diagnosis",
id_col = "participant",
id_method = "nested",
mark_new_rows = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.