Nothing
#' Simulation-based power estimation for binary outcome multi-arm
#' cluster-randomized trials.
#'
#' This function uses iterative simulations to determine
#' approximate power for multi-arm cluster-randomized controlled trials with
#' binary outcomes of interest. Users can modify a variety of parameters to
#' suit the simulations to their desired experimental situation.
#' This function validates the user's input and passes the necessary
#' arguments to an internal function, which performs the simulations.
#' This function returns the summary power values for each treatment arm.
#'
#' Users must specify the desired number of simulations, number of subjects per
#' cluster, number of clusters per treatment arm, group probabilities, and the
#' between-cluster variance. Significance level, analytic method, progress
#' updates, poor/singular fit override, and whether or not to return the
#' simulated data may also be specified. The internal function can be called
#' directly by the user to return the fitted models rather than the power
#' summaries (see \code{?cps.ma.normal.internal} for details).
#'
#' Because the model for binary outcomes may be slower to fit than those for
#' other distributions, this function may be slower than its normal or
#' count-distributed counterparts. Users can spread the simulated data
#' generation and model fitting tasks across multiple cores using the
#' \code{cores} argument. Users should expect that parallel computing may make
#' model fitting faster than using a single core for more complicated models.
#' For simpler models, users may prefer to use single thread computing
#' (\code{cores}=1), as the processes involved in allocating memory and
#' copying data across cores also may take some time. For time-savings,
#' this function stops execution early if estimated power < 0.5 or more
#' than 25\% of models produce a singular fit or non-convergence warning
#' message, unless \code{poorFitOverride = TRUE}.
#'
#' Non-convergent models are not included in the calculation of exact confidence
#' intervals.
#'
#' @section Testing details:
#' This function has been verified, where possible, against reference values from the NIH's GRT
#' Sample Size Calculator, PASS11, \code{CRTsize::n4prop}, \code{clusterPower::cps.binary}, and
#' \code{clusterPower::cpa.binary}.
#'
#' @param nsim Number of datasets to simulate; accepts integer (required).
#' @param nsubjects Number of subjects per cluster (required); accepts an
#' integer if all are equal and \code{narms} and \code{nclusters} are provided.
#' Alternately, the user can supply a list with one entry per arm if the
#' cluster sizes are the same within the arm, or, if they are not the same
#' within the arms, the user can supply a list of vectors where each vector
#' represents an arm and each entry in the vector is the number of subjects
#' per cluster.
#' @param narms Integer value representing the number of trial arms.
#' @param nclusters An integer or vector of integers representing the number
#' of clusters in each arm.
#' @param probs Expected absolute treatment effect probabilities for each arm;
#' accepts a scalar or a vector of length \code{narms} (required).
#' @param sigma_b_sq Between-cluster variance; accepts a vector of length
#' \code{narms} (required).
#' @param alpha Significance level; default = 0.05.
#' @param allSimData Option to output list of all simulated datasets;
#' default = FALSE.
#' @param method Analytical method, either Generalized Linear Mixed Effects
#' Model (GLMM) or Generalized Estimating Equation (GEE). Accepts c('glmm',
#' 'gee') (required); default = 'glmm'.
#' @param multi_p_method A string indicating the method to use for adjusting
#' p-values for multiple comparisons. Choose one of "holm", "hochberg",
#' "hommel", "bonferroni", "BH", "BY", "fdr", "none". The default is
#' "bonferroni". See \code{?p.adjust} for additional details.
#' @param quiet When set to FALSE, displays simulation progress and estimated completion time; default is FALSE.
#' @param seed Option to set.seed. Default is NULL.
#' @param poorFitOverride Option to override \code{stop()} if more than 25\% of fits fail to converge or
#' power<0.5 after 50 iterations; default = FALSE.
#' @param lowPowerOverride Option to override \code{stop()} if the power
#' is less than 0.5 after the first 50 simulations and every ten simulations
#' thereafter. On function execution stop, the actual power is printed in the
#' stop message. Default = FALSE. When TRUE, this check is ignored and the
#' calculated power is returned regardless of value.
#' @param timelimitOverride Logical. When FALSE, stops execution if the estimated
#' completion time is more than 2 minutes. Defaults to TRUE.
#' @param cores String ("all"), NA, or scalar value indicating the number of cores
#' to be used for parallel computing. Default = NA (no parallel computing).
#' @param tdist Logical value indicating whether simulated data should be
#' drawn from a t-distribution rather than the normal distribution.
#' Default = FALSE.
#' @param nofit Option to skip model fitting and analysis and return the
#' simulated data.
#' Defaults to \code{FALSE}.
#' @param optmethod User-specified optimizer method. Accepts \code{bobyqa},
#' \code{Nelder_Mead} (default), and optimizers wrapped in the \code{optimx} package.
#' @param return.all.models Logical; Returns all of the fitted models and the
#' simulated data.
#' Defaults to FALSE.
#' @return A list with the following components:
#' \describe{
#' \item{power}{Data frame with columns "power" (Estimated statistical power),
#' "lower.95.ci" (Lower 95\% confidence interval bound),
#' "upper.95.ci" (Upper 95\% confidence interval bound).}
#' \item{model.estimates}{Data frame with columns corresponding
#' to each arm with descriptive suffixes as follows:
#' ".Estimate" (Estimate of treatment effect for a given
#' simulation),
#' "Std.Err" (Standard error for treatment effect estimate),
#' ".zval" (for GLMM) | ".wald" (for GEE), and
#' ".pval" (the p-value estimate).}
#' \item{overall.power}{Table of F-test (when method="glmm") or chi^{2}
#' (when method="gee") significance test results.}
#' \item{overall.power.summary}{Summary overall power of treatment model
#' compared to the null model.}
#' \item{sim.data}{Produced when allSimData==TRUE. List of \code{nsim}
#' data frames, each containing:
#' "y" (simulated response value),
#' "trt" (indicator for treatment group or arm), and
#' "clust" (indicator for cluster).}
#' \item{model.fit.warning.percent}{Character string containing the percent
#' of \code{nsim} in which the glmm fit was singular or failed to converge,
#' produced only when method = "glmm" & allSimData = FALSE.
#' }
#' \item{model.fit.warning.incidence}{Vector of length \code{nsim} denoting
#' whether or not a simulation glmm fit triggered a "singular fit" or
#' "non-convergence" error, produced only when method = "glmm" &
#' allSimData=TRUE.
#' }
#' }
#' If \code{nofit = T}, a data frame of the simulated data sets, containing:
#' \itemize{
#' \item "arm" (Indicator for treatment arm)
#' \item "cluster" (Indicator for cluster)
#' \item "y1" ... "yn" (Simulated response value for each of the \code{nsim} data sets).
#' }
#'
#' @examples
#' \dontrun{
#' bin.ma.rct.unbal <- cps.ma.binary(nsim = 12,
#' nsubjects = list(rep(20, times=15),
#' rep(15, times=15),
#' rep(17, times=15)),
#' narms = 3,
#' nclusters = 15,
#' probs = c(0.35, 0.43, 0.50),
#' sigma_b_sq = c(0.1, 0.1, 0.1),
#' alpha = 0.05, allSimData = TRUE,
#' seed = 123, cores="all")
#'
#' bin.ma.rct.bal <- cps.ma.binary(nsim = 100, nsubjects = 50, narms=3,
#' nclusters=8,
#' probs = c(0.35, 0.4, 0.5),
#' sigma_b_sq = 0.1, alpha = 0.05,
#' quiet = FALSE, method = 'glmm',
#' allSimData = FALSE,
#' multi_p_method="none",
#' seed = 123, cores="all",
#' poorFitOverride = FALSE)
#'}
#' @author Alexandria C. Sakrejda (\email{acbro0@@umass.edu}), Alexander R. Bogdan, and Ken Kleinman (\email{ken.kleinman@@gmail.com})
#' @export
cps.ma.binary <- function(nsim = 1000,
nsubjects = NULL,
narms = NULL,
nclusters = NULL,
probs = NULL,
sigma_b_sq = NULL,
alpha = 0.05,
quiet = FALSE,
method = 'glmm',
multi_p_method = "bonferroni",
allSimData = FALSE,
seed = NULL,
cores = NA,
tdist = FALSE,
poorFitOverride = FALSE,
lowPowerOverride = FALSE,
timelimitOverride = TRUE,
nofit = FALSE,
optmethod = "Nelder-Mead",
return.all.models = FALSE) {
converge <- NULL
# use this later to determine total elapsed time
start.time <- Sys.time()
if (is.null(narms)) {
stop("ERROR: narms is required.")
}
if (narms < 3) {
stop("ERROR: LRT significance not calculable when narms < 3. Use cps.binary() instead.")
}
# create nclusters if not provided directly by user
if (isTRUE(is.list(nsubjects))) {
if (is.null(nclusters)) {
nclusters <- vapply(nsubjects, length, 0)
}
}
if (length(nclusters) == 1 & !isTRUE(is.list(nsubjects))) {
nclusters <- rep(nclusters, narms)
}
if (length(nclusters) > 1 & length(nsubjects) == 1) {
narms <- length(nclusters)
}
# input validation steps
if (!is.wholenumber(nsim) || nsim < 1 || length(nsim) > 1) {
stop("nsim must be a positive integer of length 1.")
}
if (isTRUE(is.null(nsubjects))) {
stop("nsubjects must be specified. See ?cps.ma.binary for help.")
}
if (length(nsubjects) == 1 & !isTRUE(is.numeric(nclusters))) {
stop("When nsubjects is scalar, user must supply nclusters (clusters per arm)")
}
if (length(nsubjects) == 1 & length(nclusters) == 1 &
!isTRUE(is.numeric(narms))) {
stop("User must provide narms when nsubjects and nclusters are both scalar.")
}
# nclusters must be whole numbers
if (sum(is.wholenumber(nclusters) == FALSE) != 0 ||
nclusters < 1) {
stop("nclusters must be postive integer values.")
}
# nsubjects must be whole numbers
if (sum(is.wholenumber(unlist(nsubjects)) == FALSE) != 0 ||
unlist(nsubjects) < 1) {
stop("nsubjects must be positive integer values.")
}
# Generate nclusters vector when a scalar is provided but nsubjects is a vector
if (length(nclusters) == 1 & length(nsubjects) > 1) {
nclusters <- rep(nclusters, length(nsubjects))
}
# Create nsubjects structure from narms and nclusters
if (mode(nsubjects) == "list") {
str.nsubjects <- nsubjects
} else {
if (length(nsubjects) == 1) {
str.nsubjects <- lapply(nclusters, function(x)
rep(nsubjects, x))
} else {
if (length(nsubjects) != narms) {
stop("nsubjects must be length 1 or length narms if not provided in a list.")
}
str.nsubjects <- list()
for (i in 1:length(nsubjects)) {
str.nsubjects[[i]] <- rep(nsubjects[i], times = nclusters[i])
}
}
}
# allows for probs, sigma_b_sq to be entered as scalar
if (length(sigma_b_sq) == 1) {
sigma_b_sq <- rep(sigma_b_sq, narms)
}
if (length(probs) == 1) {
probs <- rep(probs, narms)
}
if (length(probs) != narms) {
stop(
"Length of probs must equal narms, or be provided as a scalar if probs for all arms are equal."
)
}
if (length(sigma_b_sq) != narms) {
stop(
"Length of variance parameters sigma_b_sq must equal narms, or be provided as a scalar
if sigma_b_sq for all arms are equal."
)
}
validateVariance(
dist = "bin",
alpha = alpha,
ICC = NA,
sigma_sq = NA,
sigma_b_sq = sigma_b_sq,
ICC2 = NA,
sigma_sq2 = NA,
sigma_b_sq2 = NA,
method = method,
quiet = quiet,
all.sim.data = allSimData,
poor.fit.override = poorFitOverride,
cores = cores,
probs = probs
)
# run the simulations
binary.ma.rct <- cps.ma.binary.internal(
nsim = nsim,
str.nsubjects = str.nsubjects,
probs = probs,
sigma_b_sq = sigma_b_sq,
alpha = alpha,
quiet = quiet,
method = method,
all.sim.data = allSimData,
seed = seed,
poor.fit.override = poorFitOverride,
low.power.override = lowPowerOverride,
timelimitOverride = timelimitOverride,
tdist = tdist,
cores = cores,
nofit = nofit,
optmethod = optmethod,
return.all.models = return.all.models
)
#option to return simulated data only
if (nofit == TRUE) {
return(binary.ma.rct)
}
models <- binary.ma.rct[[1]]
# Create object containing summary statement
summary.message = paste0(
"Monte Carlo Power Estimation based on ",
nsim,
" Simulations: Parallel Design, Binary Outcome, ",
narms,
" Arms."
)
# Create method object
long.method = switch(method, glmm = 'Generalized Linear Mixed Model',
gee = 'Generalized Estimating Equation')
# Create object containing group-specific variance parameters
armnames <- vector(mode = "character", length = narms)
for (i in 1:narms) {
armnames[i] <- paste0("Arm.", i)
}
var.parms = data.frame('sigma_b_sq' = sigma_b_sq,
"probs" = probs)
rownames(var.parms) <- armnames
# Create object containing group-specific cluster sizes
names(str.nsubjects) <- armnames
cluster.sizes <- str.nsubjects
#Organize output for GLMM
if (method == "glmm") {
Estimates = matrix(NA, nrow = nsim, ncol = narms)
std.error = matrix(NA, nrow = nsim, ncol = narms)
z.val = matrix(NA, nrow = nsim, ncol = narms)
p.val = matrix(NA, nrow = nsim, ncol = narms)
for (i in 1:nsim) {
Estimates[i,] <- models[[i]][[10]][, 1]
std.error[i,] <- models[[i]][[10]][, 2]
z.val[i,] <- models[[i]][[10]][, 3]
p.val[i,] <-
p.adjust(models[[i]][[10]][, 4], method = multi_p_method)
}
# Organize the row/col names for the model estimates output
keep.names <- rownames(models[[1]][[10]])
names.Est <- rep(NA, narms)
names.st.err <- rep(NA, narms)
names.zval <- rep(NA, narms)
names.pval <- rep(NA, narms)
names.power <- rep(NA, narms)
for (i in 1:length(keep.names)) {
names.Est[i] <- paste(keep.names[i], ".Estimate", sep = "")
names.st.err[i] <- paste(keep.names[i], ".Std.Err", sep = "")
names.zval[i] <- paste(keep.names[i], ".zval", sep = "")
names.pval[i] <- paste(keep.names[i], ".pval", sep = "")
names.power[i] <- paste(keep.names[i], ".power", sep = "")
}
colnames(Estimates) <- names.Est
colnames(std.error) <- names.st.err
colnames(z.val) <- names.zval
colnames(p.val) <- names.pval
# convergence
cps.model.temp <-
data.frame(unlist(binary.ma.rct[[3]]), p.val)
colnames(cps.model.temp)[1] <- "converge"
cps.model.temp2 <-
dplyr::filter(cps.model.temp, converge == TRUE)
if (nrow(cps.model.temp2) < (.25 * nsim)) {
warning(paste0(
nrow(cps.model.temp2),
" models converged. Check model parameters."
),
immediate. = TRUE)
}
# Organize the LRT output
LRT.holder <-
matrix(
unlist(binary.ma.rct[[2]]),
ncol = 3,
nrow = nsim,
byrow = TRUE,
dimnames = list(seq(1:nsim),
colnames(binary.ma.rct[[2]][[1]]))
)
LRT.holder <- cbind(LRT.holder, cps.model.temp["converge"])
LRT.holder <- LRT.holder[LRT.holder[, "converge"] == TRUE, ]
sig.LRT <- ifelse(LRT.holder[, 3] < alpha, 1, 0)
# Calculate and store power estimate & confidence intervals
power.parms <- cbind(confintCalc(
multi = TRUE,
alpha = alpha,
p.val = as.vector(cps.model.temp2[, 3:length(cps.model.temp2)])
),
multi_p_method)
# Store simulation output in data frame
ma.model.est <-
data.frame(Estimates, std.error, z.val, p.val, binary.ma.rct[[3]])
ma.model.est <-
ma.model.est[, -grep('.*ntercept.*', names(ma.model.est))]
# performance messages
total.est <-
as.numeric(difftime(Sys.time(), start.time, units = 'secs'))
hr.est <- total.est %/% 3600
min.est <- total.est %/% 60
sec.est <- round(total.est %% 60, 0)
message(
paste0(
"Simulations Complete! Time Completed: ",
Sys.time(),
"\nTotal Runtime: ",
hr.est,
'Hr:',
min.est,
'Min:',
sec.est,
'Sec'
)
)
## Output objects for GLMM
# Create list containing all output (class 'crtpwr.ma') and return
if (allSimData == TRUE && return.all.models == FALSE) {
complete.output = structure(
list(
"overview" = summary.message,
"nsim" = nsim,
"power" =
prop_H0_rejection(
alpha = alpha,
nsim = nsim,
sig.LRT = sig.LRT
),
"beta" = power.parms['Beta'],
"overall.power2" = power.parms,
"overall.power" = LRT.holder,
"method" = long.method,
"alpha" = alpha,
"cluster.sizes" = cluster.sizes,
"n.clusters" = nclusters,
"variance.parms" = var.parms,
"probs" = probs,
"model.estimates" = ma.model.est,
"convergence" = binary.ma.rct[[3]],
"sim.data" = binary.ma.rct[["sim.data"]]
),
class = 'crtpwr.ma'
)
}
if (return.all.models == TRUE) {
complete.output = structure(
list(
"overview" = summary.message,
"nsim" = nsim,
"power" =
prop_H0_rejection(
alpha = alpha,
nsim = nsim,
sig.LRT = sig.LRT
),
"beta" = power.parms['Beta'],
"overall.power2" = power.parms,
"overall.power" = LRT.holder,
"method" = long.method,
"alpha" = alpha,
"cluster.sizes" = cluster.sizes,
"n.clusters" = nclusters,
"variance.parms" = var.parms,
"probs" = probs,
"model.estimates" = ma.model.est,
"convergence" = binary.ma.rct[[3]],
"sim.data" = binary.ma.rct[["sim.data"]],
"all.models" <- binary.ma.rct
),
class = 'crtpwr.ma'
)
}
if (return.all.models == FALSE && allSimData == FALSE) {
complete.output = structure(
list(
"overview" = summary.message,
"nsim" = nsim,
"power" =
prop_H0_rejection(
alpha = alpha,
nsim = nsim,
sig.LRT = sig.LRT
),
"beta" = power.parms['Beta'],
"overall.power2" = power.parms,
"overall.power" = LRT.holder,
"method" = long.method,
"alpha" = alpha,
"cluster.sizes" = cluster.sizes,
"n.clusters" = nclusters,
"variance.parms" = var.parms,
"probs" = probs,
"model.estimates" = ma.model.est,
"convergence" = binary.ma.rct[[3]]
),
class = 'crtpwr.ma'
)
}
} # end of GLMM options
#Organize output for GEE method
if (method == "gee") {
# Organize the output
Estimates = matrix(NA, nrow = nsim, ncol = narms)
std.error = matrix(NA, nrow = nsim, ncol = narms)
Wald = matrix(NA, nrow = nsim, ncol = narms)
Pr = matrix(NA, nrow = nsim, ncol = narms)
for (i in 1:nsim) {
Estimates[i,] <- models[[i]]$coefficients[, 1]
std.error[i,] <- models[[i]]$coefficients[, 2]
Wald[i,] <- models[[i]]$coefficients[, 3]
Pr[i,] <- models[[i]]$coefficients[, 4]
}
# Organize the row/col names for the output
keep.names <- rownames(models[[1]]$coefficients)
names.Est <- rep(NA, length(narms))
names.st.err <- rep(NA, length(narms))
names.wald <- rep(NA, length(narms))
names.pval <- rep(NA, length(narms))
names.power <- rep(NA, length(narms))
for (i in 1:length(keep.names)) {
names.Est[i] <- paste(keep.names[i], ".Estimate", sep = "")
names.st.err[i] <- paste(keep.names[i], ".Std.Err", sep = "")
names.wald[i] <- paste(keep.names[i], ".wald", sep = "")
names.pval[i] <- paste(keep.names[i], ".pval", sep = "")
names.power[i] <- paste(keep.names[i], ".power", sep = "")
}
colnames(Estimates) <- names.Est
colnames(std.error) <- names.st.err
colnames(Wald) <- names.wald
colnames(Pr) <- names.pval
# Organize the LRT output
LRT.holder <-
matrix(
unlist(binary.ma.rct[[2]]),
ncol = 3,
nrow = nsim,
byrow = TRUE,
dimnames = list(seq(1:nsim),
c("Df", "X2", "P(>|Chi|)"))
)
# Proportion of times P(>F)
converge <- binary.ma.rct[["converged"]]
LRT.holder <- cbind(LRT.holder, converge)
LRT.holder <- LRT.holder[LRT.holder[, "converge"] == TRUE, ]
sig.LRT <- ifelse(LRT.holder[, 3] < alpha, 1, 0)
# Calculate and store power estimate & confidence intervals
sig.val <- ifelse(Pr < alpha, 1, 0)
pval.power <-
apply(
sig.val,
2,
FUN = function(x) {
sum(x, na.rm = TRUE) / nsim
}
)
power.parms <- cbind(confintCalc(
multi = TRUE,
alpha = alpha,
p.val = Pr[, 2:narms]
),
multi_p_method)
# Store GEE simulation output in data frame
ma.model.est <- data.frame(Estimates, std.error, Wald, Pr)
ma.model.est <-
ma.model.est[, -grep('.*ntercept.*', names(ma.model.est))]
# performance messages
total.est <-
as.numeric(difftime(Sys.time(), start.time, units = 'secs'))
hr.est <- total.est %/% 3600
min.est <- total.est %/% 60
sec.est <- round(total.est %% 60, 0)
message(
paste0(
"Simulations Complete! Time Completed: ",
Sys.time(),
"\nTotal Runtime: ",
hr.est,
'Hr:',
min.est,
'Min:',
sec.est,
'Sec'
)
)
# Create list containing all output (class 'crtpwr.ma') and return
if (allSimData == TRUE & return.all.models == FALSE) {
complete.output = structure(
list(
"overview" = summary.message,
"nsim" = nsim,
"power" =
prop_H0_rejection(
alpha = alpha,
nsim = nsim,
sig.LRT = sig.LRT
),
"beta" = power.parms['Beta'],
"overall.power2" = power.parms,
"overall.power" = LRT.holder,
"method" = long.method,
"alpha" = alpha,
"cluster.sizes" = cluster.sizes,
"n.clusters" = nclusters,
"variance.parms" = var.parms,
"probs" = probs,
"model.estimates" = ma.model.est,
"sim.data" = binary.ma.rct[["sim.data"]]
),
class = 'crtpwr.ma'
)
}
if (return.all.models == TRUE) {
complete.output = structure(
list(
"overview" = summary.message,
"nsim" = nsim,
"power" =
prop_H0_rejection(
alpha = alpha,
nsim = nsim,
sig.LRT = sig.LRT
),
"beta" = power.parms['Beta'],
"overall.power2" = power.parms,
"overall.power" = LRT.holder,
"method" = long.method,
"alpha" = alpha,
"cluster.sizes" = cluster.sizes,
"n.clusters" = nclusters,
"variance.parms" = var.parms,
"probs" = probs,
"model.estimates" = ma.model.est,
"all.models" <- binary.ma.rct
),
class = 'crtpwr.ma'
)
}
if (return.all.models == FALSE && allSimData == FALSE) {
complete.output = structure(
list(
"overview" = summary.message,
"nsim" = nsim,
"power" =
prop_H0_rejection(
alpha = alpha,
nsim = nsim,
sig.LRT = sig.LRT
),
"beta" = power.parms['Beta'],
"overall.power2" = power.parms,
"overall.power" = LRT.holder,
"method" = long.method,
"alpha" = alpha,
"cluster.sizes" = cluster.sizes,
"n.clusters" = nclusters,
"variance.parms" = var.parms,
"probs" = probs,
"model.estimates" = ma.model.est
),
class = 'crtpwr.ma'
)
}
}# end of GEE options
return(complete.output)
}# end of fxn
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.