SRS is a Subject Randomization System based on the paper by Pocock and Simon [-@ps:1975]. It follows the development in the paper rather closely. In this vignette we show how one might use the system in designing and implementing randomizations for clinical trials.
Note: Although everything here should work, this is somewhat dated
code that I put on R-forge and forgot for a good bit of time. I'd do
this much differently now; in fact, I have a version using R6
classes that I hope to release at some point.
This vignette has two parts to it. The first part goes into detail discussing some of the innards of the package. This is most meaningful to those in our Biostatistics core who may recommend this software for use in trials. The second part is more information on how one might use it in conducting a trial.
This package is written using S4
classes. No deep knowledge of S4
classes is assumed in what follows.
There are two main classes that most users of the package will use:
ClinicalExperiment
and PocockSimonRandomizer
. The class
ClinicalExperiment
, as the name implies, encapsulates the
characteristics of a clinical experiment. An instance of this class is
used to create an instance of the other class PocockSimonRandomizer
so that the randomizer remains associated with a particular clinical
experiment.
Let us create a simple clinical experiment object after invoking the
requisite package. The function ClinicalExperiment
(as distinct from
the ClinicalExperiment
class) is available for us.
library(SRS) expt0 <- ClinicalExperiment(number.of.factors = 3, number.of.factor.levels = c(2, 2, 3), number.of.treatments = 3)
This create an experiment with three factors and three treatments. The
first factor has 2 levels, the second 2, and the third 3. If none of
the arguments are specified, the default is to create a two-factor,
two-treatment experiment with each factor having two levels. One can
name the factors with the argument factor.names
but default names
such as F_1
, F_2
, etc., are provided. The levels are currently
indicated by the suffixes -1
, -2
, etc., that are attached to the
factor names; a flexible naming scheme for this might be introduced
later.
It is useful to print the object to see what it contains.
print(expt0)
Of course, in anything other than a toy setting, one actually provides some names for the factor and factor levels. We'll use this in what follows.
expt <- ClinicalExperiment(number.of.factors = 3, factor.names = c("Sex", "Race", "Stage"), number.of.factor.levels = c(2, 2, 3), factor.level.names = list(c("Female", "Male"), c("Caucasian", "Non-caucasian"), c("I", "II", "III")), number.of.treatments = 3, treatment.names <- c("Placebo", "Arm1", "Arm2")) print(expt)
Now let's create a randomizer that will work for this experiment.
randomizerObject <- new("PocockSimonRandomizer", expt, 12345L) print(randomizerObject)
Note that we don't have a helper constructor function (for no
particular reason) and so we had to use the new
function to create
the object. (Indeed, that is what the ClinicalExperiment
function
does behind the scenes.)
The output of the print above indicates that there are some default
settings for the randomizer. For example, the treatment ratios are all
1
's indicating equal treatment preference; others such as 1 2 1
could have been specified. Note the stateTable
slot which will
summarize the margins of the factor distributions by treatment. Since
no randomization has been done, the slot tr.assignments
is empty.
Of interest are the slots named d.func
, g.func
and p.func
. The
d.func
computes imbalance due to assigning each of the treatments,
g.func
computes the overall imbalance, and the p.func
computes the
probabilities of assigning each treatment based on the overall
imbalance. All of these can be changed by the user. Default values for
these functions are the ones described in [@ps:1975].
\subsection{Using the Randomizer} \label{sec:using}
Now that we have defined the experiment and the randomizer, we can randomize several subjects using these classes. First some helper functions that are useful in simulations for generating random identifiers and factors.
### ### Generate a random Id for a subject (max 10000000)! ### generateId <- function(i) { if (i < 0 || i > 10000) { stop("generateId: Arg expected to be between 1 and 9999") } zero.count <- 5 - trunc(log10(i)) - 1 prefix <- substring(10^zero.count, 2) paste("ID.", prefix, i, sep="") } ### ### Generate random factors; if n is the number of factors, limits is a list ### of length n with each element being a vector of possible factor levels ### generateRandomFactors <- function(factor.levels) { unlist(lapply(factor.levels, function(x) sample(x, 1))) }
Now, we will run a 10 randomizations and print the results.
for (i in seq(10)) randomizeSubject(randomizerObject) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names)) knitr::kable(tr.assignments(randomizerObject))
Just in case we are only interested in the last assigned treatment:
lastRandomization(randomizerObject)
We can also look at the marginal distributions thus:
knitr::kable(stateTable(randomizerObject))
The functions for computing imbalance, overall imbalance and probabilities can all be customized. These are best illustrated by additional examples.
Let's move away from the default range function to say the standard
deviation (sd
) function.
randomizerObject.2 <- new("PocockSimonRandomizer", expt, as.integer(12345), d.func = sd) print(randomizerObject.2@d.func)
Now let's run that simulation again.
for (i in seq(10)) randomizeSubject(randomizerObject.2) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names))
Now print the summaries.
tbl <- table(tr.assignments(randomizerObject.2)$Treatment) knitr::kable(do.call(data.frame, as.list(tbl)))
knitr::kable(stateTable(randomizerObject.2))
Now let's weight imbalance on factor 1 more than the others by a
factor of 5. We do this by modifying the g.func
.
## Note: imbalances is a number of factors by number of treatments matrix g.func <- function(imbalances) { factor.weights <- c (20, 1, 1) imbalances %*% factor.weights } randomizerObject.3 <- new("PocockSimonRandomizer", expt, 12345L, d.func = sd, g.func = g.func) print(randomizerObject.3@g.func)
Now the simulation.
for (i in seq(1000)) randomizeSubject(randomizerObject.3) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names))
Let's look at the distribution of treatments and the marginal distribution of factors.
tbl <- table(tr.assignments(randomizerObject.3)$Treatment) knitr::kable(do.call(data.frame, as.list(tbl)))
knitr::kable(stateTable(randomizerObject.3))
Next, we try a simulation where we require 5:2:1 randomization. This merely requires setting the treatment ratios.
randomizerObject.4 <- new("PocockSimonRandomizer", expt, 32871L, tr.ratios = c(5, 2, 1)) for (i in seq(1000)) randomizeSubject(randomizerObject.4) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names))
tbl <- table(tr.assignments(randomizerObject.4)$Treatment) knitr::kable(do.call(data.frame, as.list(tbl)))
knitr::kable(stateTable(randomizerObject.4))
To full see the effect that the treatment ratios force, we need to change the function that computes probabilities for picking each treatment based on the randomization. A greedy one like the one below will grease the squeaky wheel:
p.func.greedy <- function(overallImbalance) { number.of.treatments <- length(overallImbalance) k <- which(overallImbalance == min(overallImbalance)) ## Note there could be ties here... p.vec <- rep(0, number.of.treatments) p.vec[k] <- 1 p.vec / sum(p.vec) ## will pick ties randomly }
Now, a new randomizer.
randomizerObject.4a <- new("PocockSimonRandomizer", expt, 2212L, tr.ratios = c(5, 2, 1), p.func = p.func.greedy)
A simulation.
for (i in seq(1000)) randomizeSubject(randomizerObject.4a) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names))
tbl <- table(tr.assignments(randomizerObject.4a)$Treatment) knitr::kable(do.call(data.frame, as.list(tbl)))
knitr::kable(stateTable(randomizerObject.4a))
The drawback of using the greedy function in the previous example is that there is some predictability as to what the randomizer will assign based on the current state. To throw in a bit of uncertainty, we can define another function that favors the appropriate treatment heavily, but not deterministically.
p.func.not.so.greedy <- function(overallImbalance) { FAVORED.PROB <- 0.75 number.of.treatments <- length(overallImbalance) k <- which(overallImbalance == min(overallImbalance)) if (length(k) > 1) { k <- sample(k, 1) } p.vec <- rep((1 - FAVORED.PROB) / (number.of.treatments - 1), number.of.treatments) p.vec[k] <- FAVORED.PROB p.vec }
randomizerObject.5 <- new("PocockSimonRandomizer", expt, 28923L, tr.ratios = c(5, 2, 1), p.func = p.func.not.so.greedy)
Now, a simulation.
for (i in seq(1000)) randomizeSubject(randomizerObject.5) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names))
tbl <- table(tr.assignments(randomizerObject.5)$Treatment) knitr::kable(do.call(data.frame, as.list(tbl)))
knitr::kable(stateTable(randomizerObject.5))
Another possibility for the probability function might be based on the actual imbalances.
p.func.imbalance <- function(overallImbalance) { p.vec <- overallImbalance / sum(overallImbalance) p.vec }
Of course, this assumes that the imbalances calculated are non-negative, which would be the case with range or standard deviation. But some care must be taken to ensure this is the case for arbitrary situations.
I've often has situation where someone has come in the middle of a study and asked to do this sort of randomization when they have already randomized some subjects by hand or some other means. So, in this section, we plan to ensure that the restart process is doing the right thing.
We will first generate 25 randomizations.
rObj <- new("PocockSimonRandomizer", expt, 2213L) for (i in seq(25)) randomizeSubject(rObj) <- list(subject.id = generateId(i), factor.values = generateRandomFactors(expt@factor.level.names))
We now save the state of the randomizer, the assignments, and the random number generate state and continue for a few more randomizations.
savedAssignments <- tr.assignments(rObj) savedState <- stateTable(rObj) savedSeed <- .Random.seed for (i in seq(50)) randomizeSubject(rObj) <- list(subject.id = generateId(i+50), factor.values = generateRandomFactors(expt@factor.level.names))
Once again, we save the assignments and state.
assignments <- tr.assignments(rObj) state <- stateTable(rObj)
If we now restart the randomizer at the state that existed after the 25 randomizations, and run it forward again, it should produce the same set of randomizations for the next 50.
tr.assignments(rObj) <- savedAssignments assign(x = ".Random.seed", value = savedSeed, envir = .GlobalEnv) for (i in seq(50)) randomizeSubject(rObj) <- list(subject.id = generateId(i+50), factor.values = generateRandomFactors(expt@factor.level.names))
Let's check that the answers are the same.
identical(stateTable(rObj), state) identical(tr.assignments(rObj), assignments)
The current package can be used without recourse to a database for persistence. This would require the initial definition of the clinical experiment as in the example(s) above along with the randomizer. This is done once for a study on a designated computer running R to which the person assigned to do the randomization will have primary access.
Thereafter, every time a subject is to be randomized (after all the usual procedures for registration in the study) the randomization process will require merely an id for the subject and the levels of the prognostic factors of interest. The randomization is performed simply by running the code snippet
randomizeSubject(randomizerObject) <- list(subject.id = id, factor.values = c(fac1, fac2, fac3)) lastRandomization(randomizerObject)
where randomizerObject
is a randomizer created as above, and id
,
fac1
, fac2
, fac3
, are the study id and the associated factor
levels of the subject to be randomized.
After each assignment, the person can save the R workspace so that the state is preserved. If R is invoked from the same directory again, the state is restored for subsequent randomizations. Of course, this means all the usual responsibilities for saving the workspace apply for this mode of operation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.