Welcome to jumble! This program was written as a companion to academic work performed by researchers at Brown University to conduct re-randomization for cluster-randomized nursing home trials.

library(tidyverse)
library(ggplot2)
library(jumble)

Introduction

In this vignette we demonstrate some functions used for random assignment, following best practices outlined by other R packages like randomizeR, randomizr etc.

1) Randomization must be transparent 2) Randomization must be reproducibile 3) Randomization must be random!

Example dataset

The example dataset used in this package is freely available and comes from a clinical HIV therapy trial conducted in the 1990s.[@HammerSM1996] See: ?jumble::ACTG175

Load dataset

df <- jumble::ACTG175

A note on random number generation

The random sampling method underlying jumble functions is the base R function sample(), rbinom(). While computational speed may be problematic in some cases, the base sample function is reliable, stable, and well-documented and tested.

The sample() functions calls the default Random number generator for R, the Mershene-Twister method, which can be queried using RNGKind().@Mersenne1998

Current system build:

t <- sessionInfo()
cat(t$R.version$version.string)
cat('Windows platform :', t$platform)

Random assignment to K-groups

Assigning treatment is easy if you want equal probability of assignment to one or more groups, without concern to final sample size in each arm, covariate balance etc.

assign <- rnd_assign(df$pidnum, 2, as.integer(as.Date('2019-06-26')))

head(assign)

Truly random assignment!

The seed is stored as a attribute

seed_iter <- attr(assign, 'seed')
paste0('Randomization performed: ', Sys.Date(), ' Seed: ', seed_iter)

Don't lose that information! As long as you know the seed you can reproduce the randomization:

assign <- rnd_assign(df$pidnum, 2, seed_iter)

head(assign)

Simulation to demonstrate equal probability

Perform 20,000 randomizations

pidnum <- df$pidnum  

n=20000L

df_assign <- lapply(1:n, function(x) rnd_assign(pidnum, 2, 18073L+x))

assigns <- bind_cols(df_assign[[1]][, 1], 
                     lapply(df_assign, function(x) x[, 2]))

df_assign_2 <- gather(assigns, key = 'trial', value = 'group', 
                      starts_with('group')) %>%
  group_by(id) %>%
  summarize(`No. of trials` = n,
            `No. group 'a'` = sum(group == 'a'),
            `No. group 'b'` = sum(group == 'b'),
            `Prob. group = 'a'` = `No. group 'a'` / n(),
            `Prob. group = 'b'` = `No. group 'b'` / n())

head(df_assign_2)

First glance shows ~ equal likelihood of group assignment in 20,000 trials.

Assignment tests

ggplot(df_assign_2, aes(x = `Prob. group = 'a'`,
                        y = `Prob. group = 'b'`)) +
  geom_violin( colour = 'blue') +
  scale_x_continuous(limits = c(0.47, 0.53)) +
  scale_y_continuous(limits = c(0.47, 0.53)) +
  labs(title="Probability of treatment assignment", 
       subtitle="K=2; completely random assignment; 20,000 trials",
       x="Pr | Group A",
       y="Pr | Group B")

For any given individual, across 20,000 trials no assignment exceeds an imbalance of ~2-3%.

cat("Probability of assignment to group 'a' in 20,000 trials \n")
summary(df_assign_2$`Prob. group = 'a'`)
cat("Probability of assignment to group 'b' in 20,000 trials \n")
summary(df_assign_2$`Prob. group = 'b'`)

However, in each trial the number of individuals in group A vs. B was random and could be imbalanced.

df_trials <- gather(assigns, key = 'trial', value = 'group', 
                      starts_with('group')) %>%
  group_by(trial) %>%
  summarize(`No. group 'a'` = sum(group == 'a'),
            `No. group 'b'` = sum(group == 'b'))
cat("No. assigned to group 'a' in 20,000 trials \n")
summary(df_trials$`No. group 'a'`)
cat("No assigned to group 'b' in 20,000 trials \n")
summary(df_trials$`No. group 'b'`)

See how the number assigned varies widely?

what if you want to guarantee an equal number of individuals are in each group?

Random assignment to K-groups, equal sample size

We will repeat the exercise, dropping one person to make an odd-numbered cohort.

pidnum <- df$pidnum 
pidnum <- pidnum[2:length(pidnum)] # drop 1 

n=10000

seeds <- sample(-100000:100000, n)
df_assign <- lapply(1:n, function(x) {
  set.seed(seeds[x])
  rnd_allot(pidnum)
  })

assigns <- bind_cols(df_assign[[1]][, 1], 
                     lapply(df_assign, function(x) x[, 2]))

df_assign_2 <- gather(assigns, key = 'trial', value = 'group', 
                      starts_with('group')) %>%
  group_by(id) %>%
  summarize(`No. of trials` = n,
            `No. group 'a'` = sum(group == 'a'),
            `No. group 'b'` = sum(group == 'b'),
            `Prob. group = 'a'` = `No. group 'a'` / n(),
            `Prob. group = 'b'` = `No. group 'b'` / n())

head(df_assign_2)

Assignment tests

ggplot(df_assign_2, aes(x = `Prob. group = 'a'`,
                        y = `Prob. group = 'b'`)) +
  geom_violin( colour = 'blue') +
  scale_x_continuous(limits = c(0.47, 0.53)) +
  scale_y_continuous(limits = c(0.47, 0.53)) +
  labs(title="Probability of treatment assignment", 
       subtitle="K=2; completely random assignment; 20,000 trials",
       x="Pr | Group A",
       y="Pr | Group B")

Random assignment looks well-balanced between groups.

cat("Probability of assignment to group 'a' in 20,000 trials \n")
summary(df_assign_2$`Prob. group = 'a'`)
cat("Probability of assignment to group 'b' in 20,000 trials \n")
summary(df_assign_2$`Prob. group = 'b'`)
df_trials <- gather(assigns, key = 'trial', value = 'group', 
                      starts_with('group')) %>%
  group_by(trial) %>%
  summarize(`No. group 'a'` = sum(group == 'a'),
            `No. group 'b'` = sum(group == 'b'))

What about the size of the groups?

cat("No. assigned to group 'a' in 20,000 trials \n")
summary(df_trials$`No. group 'a'`)
cat("No assigned to group 'b' in 20,000 trials \n")
summary(df_trials$`No. group 'b'`)

Equal size! The number now randomly goes between 526 or 527 for either the 'a' or 'b' group. The imbalance is selected with equal probability.

Contacting authors

The primary author of the package was Kevin W. McConeghy. See here

References



kmcconeghy/jumble documentation built on Jan. 8, 2020, 8:52 a.m.