Welcome to jumble! This program was written as a companion to academic work performed by researchers at Brown University to conduct re-randomization for cluster-randomized nursing home trials.
library(tidyverse) library(ggplot2) library(jumble)
In this vignette we demonstrate some functions used for random assignment, following best practices outlined by other R packages like randomizeR, randomizr etc.
1) Randomization must be transparent 2) Randomization must be reproducibile 3) Randomization must be random!
The example dataset used in this package is freely available and comes from a clinical HIV therapy trial conducted in the 1990s.[@HammerSM1996] See: ?jumble::ACTG175
df <- jumble::ACTG175
The random sampling method underlying jumble functions is the base R function sample()
, rbinom()
. While computational speed may be problematic in some cases, the base sample function is reliable, stable, and well-documented and tested.
The sample()
functions calls the default Random number generator for R, the Mershene-Twister method, which can be queried using RNGKind()
.@Mersenne1998
Current system build:
t <- sessionInfo() cat(t$R.version$version.string)
cat('Windows platform :', t$platform)
Assigning treatment is easy if you want equal probability of assignment to one or more groups, without concern to final sample size in each arm, covariate balance etc.
assign <- rnd_assign(df$pidnum, 2, as.integer(as.Date('2019-06-26'))) head(assign)
Truly random assignment!
The seed is stored as a attribute
seed_iter <- attr(assign, 'seed')
paste0('Randomization performed: ', Sys.Date(), ' Seed: ', seed_iter)
Don't lose that information! As long as you know the seed you can reproduce the randomization:
assign <- rnd_assign(df$pidnum, 2, seed_iter) head(assign)
Perform 20,000 randomizations
pidnum <- df$pidnum n=20000L df_assign <- lapply(1:n, function(x) rnd_assign(pidnum, 2, 18073L+x)) assigns <- bind_cols(df_assign[[1]][, 1], lapply(df_assign, function(x) x[, 2])) df_assign_2 <- gather(assigns, key = 'trial', value = 'group', starts_with('group')) %>% group_by(id) %>% summarize(`No. of trials` = n, `No. group 'a'` = sum(group == 'a'), `No. group 'b'` = sum(group == 'b'), `Prob. group = 'a'` = `No. group 'a'` / n(), `Prob. group = 'b'` = `No. group 'b'` / n()) head(df_assign_2)
First glance shows ~ equal likelihood of group assignment in 20,000 trials.
ggplot(df_assign_2, aes(x = `Prob. group = 'a'`, y = `Prob. group = 'b'`)) + geom_violin( colour = 'blue') + scale_x_continuous(limits = c(0.47, 0.53)) + scale_y_continuous(limits = c(0.47, 0.53)) + labs(title="Probability of treatment assignment", subtitle="K=2; completely random assignment; 20,000 trials", x="Pr | Group A", y="Pr | Group B")
For any given individual, across 20,000 trials no assignment exceeds an imbalance of ~2-3%.
cat("Probability of assignment to group 'a' in 20,000 trials \n") summary(df_assign_2$`Prob. group = 'a'`) cat("Probability of assignment to group 'b' in 20,000 trials \n") summary(df_assign_2$`Prob. group = 'b'`)
However, in each trial the number of individuals in group A vs. B was random and could be imbalanced.
df_trials <- gather(assigns, key = 'trial', value = 'group', starts_with('group')) %>% group_by(trial) %>% summarize(`No. group 'a'` = sum(group == 'a'), `No. group 'b'` = sum(group == 'b'))
cat("No. assigned to group 'a' in 20,000 trials \n") summary(df_trials$`No. group 'a'`) cat("No assigned to group 'b' in 20,000 trials \n") summary(df_trials$`No. group 'b'`)
See how the number assigned varies widely?
what if you want to guarantee an equal number of individuals are in each group?
We will repeat the exercise, dropping one person to make an odd-numbered cohort.
pidnum <- df$pidnum pidnum <- pidnum[2:length(pidnum)] # drop 1 n=10000 seeds <- sample(-100000:100000, n) df_assign <- lapply(1:n, function(x) { set.seed(seeds[x]) rnd_allot(pidnum) }) assigns <- bind_cols(df_assign[[1]][, 1], lapply(df_assign, function(x) x[, 2])) df_assign_2 <- gather(assigns, key = 'trial', value = 'group', starts_with('group')) %>% group_by(id) %>% summarize(`No. of trials` = n, `No. group 'a'` = sum(group == 'a'), `No. group 'b'` = sum(group == 'b'), `Prob. group = 'a'` = `No. group 'a'` / n(), `Prob. group = 'b'` = `No. group 'b'` / n()) head(df_assign_2)
ggplot(df_assign_2, aes(x = `Prob. group = 'a'`, y = `Prob. group = 'b'`)) + geom_violin( colour = 'blue') + scale_x_continuous(limits = c(0.47, 0.53)) + scale_y_continuous(limits = c(0.47, 0.53)) + labs(title="Probability of treatment assignment", subtitle="K=2; completely random assignment; 20,000 trials", x="Pr | Group A", y="Pr | Group B")
Random assignment looks well-balanced between groups.
cat("Probability of assignment to group 'a' in 20,000 trials \n") summary(df_assign_2$`Prob. group = 'a'`) cat("Probability of assignment to group 'b' in 20,000 trials \n") summary(df_assign_2$`Prob. group = 'b'`)
df_trials <- gather(assigns, key = 'trial', value = 'group', starts_with('group')) %>% group_by(trial) %>% summarize(`No. group 'a'` = sum(group == 'a'), `No. group 'b'` = sum(group == 'b'))
What about the size of the groups?
cat("No. assigned to group 'a' in 20,000 trials \n") summary(df_trials$`No. group 'a'`) cat("No assigned to group 'b' in 20,000 trials \n") summary(df_trials$`No. group 'b'`)
Equal size! The number now randomly goes between 526 or 527 for either the 'a' or 'b' group. The imbalance is selected with equal probability.
The primary author of the package was Kevin W. McConeghy. See here
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.