#' Validating fitted subgroup identification models
#'
#' @description Validates subgroup treatment effects for fitted
#' subgroup identification model class of Chen, et al (2017)
#'
#' @param model fitted model object returned by \code{fit.subgroup()} function
#' @param method validation method. \code{"boot_bias_correction"} for the bootstrap
#' bias correction method of Harrell, et al (1996) or \code{"training_test_replication"}
#' for repeated training and test splitting of the data (\code{train.fraction} should be specified
#' for this option)
#' @param B integer. number of bootstrap replications or refitting replications.
#' @param train.fraction fraction (between 0 and 1) of samples to be used for training in
#' training/test replication. Only used for \code{method = "training_test_replication"}
#' @param benefit.score.quantiles a vector of quantiles (between 0 and 1) of the benefit score values
#' for which to return bootstrapped information about the subgroups. ie if one of the quantile values is 0.5, the
#' median value of the benefit scores will be used as a cutoff to determine subgroups and summary statistics
#' will be returned about these subgroups
#' @param parallel Should the loop over replications be parallelized? If \code{FALSE}, then no, if \code{TRUE}, then yes.
#' If user sets \code{parallel = TRUE} and the fitted \code{fit.subgroup()} object uses the parallel version of
#' an internal model, say for \code{cv.glmnet()}, then the internal parallelization will be overridden so as
#' not to create a conflict of parallelism.
#' @seealso \code{\link[personalized]{fit.subgroup}} for function which fits subgroup identification models,
#' \code{\link[personalized]{plot.subgroup_validated}} for plotting of validation results, and
#' \code{\link[personalized]{print.subgroup_validated}} for arguments for printing options for \code{validate.subgroup()}.
#' @references Chen, S., Tian, L., Cai, T. and Yu, M. (2017), A general statistical framework for subgroup identification
#' and comparative treatment scoring. Biometrics. doi:10.1111/biom.12676
#'
#' Harrell, F. E., Lee, K. L., and Mark, D. B. (1996). Tutorial in biostatistics multivariable prognostic models: issues in developing models,
#' evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine, 15, 361-387.
#' doi:10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4
#' @details Estimates of various quantities conditional on subgroups and treatment statuses are provided and displayed
#' via the \code{\link[personalized]{print.subgroup_validated}} function:
#' \enumerate{
#' \item{"Conditional expected outcomes"}{ The first results shown when printing
#' a \code{subgroup_validated} object are estimates of the expected outcomes conditional on
#' the estimated subgroups (i.e. which subgroup is 'recommended' by the model) and conditional
#' on treatment/intervention status. If there are two total treatment options, this results in a 2x2 table
#' of expected conditional outcomes. }
#' \item{"Treatment effects conditional on subgroups"}{ The second results shown when printing
#' a \code{subgroup_validated} object are estimates of the expected outcomes conditional on
#' the estimated subgroups. If the treatment takes levels \eqn{j \in \{1, \dots, K\}}, a total of \eqn{K}
#' conditional treatment effects will be shown. For example, of the outcome is continuous, the
#' \eqn{j}th conditional treatment effect is defined as \eqn{E(Y|Trt = j, Subgroup=j) - E(Y|Trt = j, Subgroup =/= j)}},
#' where \eqn{Subgroup=j} if treatment \eqn{j} is recommended, i.e. treatment \eqn{j} results in the largest/best
#' expected potential outcomes given the fitted model.
#' \item{"Overall treatment effect conditional on subgroups "}{ The third quantity displayed shows the overall improvement
#' in outcomes resulting from all treatment recommendations. This is essentially an average over all of the
#' conditional treatment effects weighted by the proportion of the population recommended each respective
#' treatment level.}
#' }
#' @return An object of class \code{"subgroup_validated"}
#' \item{avg.results}{Estimates of average conditional treatment effects when
#' subgroups are determined based on the provided cutoff value for the benefit score. For example,
#' if \code{cutoff = 0} and there is a treatment and control only, then the treatment is
#' recommended if the benefit score is greater than 0.}
#' \item{se.results}{Standard errors of the estimates from \code{avg.estimates}}
#' \item{boot.results}{Contains the individual results for each replication. \code{avg.results} is comprised
#' of averages of the values from \code{boot.results}}
#' \item{avg.quantile.results}{Estimates of average conditional treatment effects when
#' subgroups are determined based on different quntile cutoff values for the benefit score. For example,
#' if \code{benefit.score.quantiles = 0.75} and there is a treatment and control only, then the treatment is
#' recommended if the benefit score is greater than the 75th upper quantile of all benefit scores. If multiple quantile
#' values are provided, e.g. \code{benefit.score.quantiles = c(0.15, 0.5, 0.85)}, then results will be provided
#' for all quantile levels.}
#' \item{se.quantile.results}{Standard errors corresponding to \code{avg.quantile.results}}
#' \item{boot.results.quantiles}{Contains the individual results for each replication. \code{avg.quantile.results} is comprised
#' of averages of the values from \code{boot.results.quantiles}}
#' \item{family}{Family of the outcome. For example, \code{"gaussian"} for continuous outcomes}
#' \item{method}{Method used for subgroup identification model. Weighting or A-learning}
#' \item{n.trts}{The number of treatment levels}
#' \item{comparison.trts}{All treatment levels other than the reference level}
#' \item{reference.trt}{The reference level for the treatment. This should usually be the control group/level}
#' \item{larger.outcome.better}{If larger outcomes are preferred for this model}
#' \item{cutpoint}{Benefit score cutoff value used for determining subgroups}
#' \item{val.method}{Method used for validation}
#' \item{iterations}{Number of replications used in the validation process}
#' \item{nobs}{Number of observations in \code{x} provided to \code{\link[personalized]{fit.subgroup}}}
#' \item{nvars}{Number of variables in \code{x} provided to \code{\link[personalized]{fit.subgroup}}}
#' @references Huling. J.D. and Yu, M. (2021), Subgroup Identification Using the personalized Package.
#' Journal of Statistical Software 98(5), 1-60. doi:10.18637/jss.v098.i05
#' @importFrom stats predict sd
#' @import foreach
#' @examples
#' library(personalized)
#'
#' set.seed(123)
#' n.obs <- 500
#' n.vars <- 20
#' x <- matrix(rnorm(n.obs * n.vars, sd = 3), n.obs, n.vars)
#'
#'
#' # simulate non-randomized treatment
#' xbetat <- 0.5 + 0.5 * x[,11] - 0.5 * x[,13]
#' trt.prob <- exp(xbetat) / (1 + exp(xbetat))
#' trt01 <- rbinom(n.obs, 1, prob = trt.prob)
#'
#' trt <- 2 * trt01 - 1
#'
#' # simulate response
#' delta <- 2 * (0.5 + x[,2] - x[,3] - x[,11] + x[,1] * x[,12])
#' xbeta <- x[,1] + x[,11] - 2 * x[,12]^2 + x[,13]
#' xbeta <- xbeta + delta * trt
#'
#' # continuous outcomes
#' y <- drop(xbeta) + rnorm(n.obs, sd = 2)
#'
#' # create function for fitting propensity score model
#' prop.func <- function(x, trt)
#' {
#' # fit propensity score model
#' propens.model <- cv.glmnet(y = trt,
#' x = x, family = "binomial")
#' pi.x <- predict(propens.model, s = "lambda.min",
#' newx = x, type = "response")[,1]
#' pi.x
#' }
#'
#' subgrp.model <- fit.subgroup(x = x, y = y,
#' trt = trt01,
#' propensity.func = prop.func,
#' loss = "sq_loss_lasso",
#' # option for cv.glmnet,
#' # better to use 'nfolds=10'
#' nfolds = 3)
#'
#'
#' x.test <- matrix(rnorm(10 * n.obs * n.vars, sd = 3), 10 * n.obs, n.vars)
#'
#'
#' # simulate non-randomized treatment
#' xbetat.test <- 0.5 + 0.5 * x.test[,11] - 0.5 * x.test[,13]
#' trt.prob.test <- exp(xbetat.test) / (1 + exp(xbetat.test))
#' trt01.test <- rbinom(10 * n.obs, 1, prob = trt.prob.test)
#'
#' trt.test <- 2 * trt01.test - 1
#'
#' # simulate response
#' delta.test <- 2 * (0.5 + x.test[,2] - x.test[,3] - x.test[,11] + x.test[,1] * x.test[,12])
#' xbeta.test <- x.test[,1] + x.test[,11] - 2 * x.test[,12]^2 + x.test[,13]
#' xbeta.test <- xbeta.test + delta.test * trt.test
#'
#' y.test <- drop(xbeta.test) + rnorm(10 * n.obs, sd = 2)
#'
#' valmod <- validate.subgroup(subgrp.model, B = 2,
#' method = "training_test",
#' train.fraction = 0.75)
#' valmod
#'
#' print(valmod, which.quant = c(4, 5))
#'
#' @export
validate.subgroup <- function(model,
B = 50L,
method = c("training_test_replication",
"boot_bias_correction"),
train.fraction = 0.75,
benefit.score.quantiles = c(0.1666667, 0.3333333, 0.5000000, 0.6666667, 0.8333333),
parallel = FALSE)
{
method <- match.arg(method)
if (method == "boot_bias_correction")
{
if (!is.null(model$nobs) & !is.null(model$vars))
{
if (model$nobs < model$vars)
{
warning("method='boot_bias_correction' not recommended for high-dimensional data.
Use method='training_test_replication' instead.")
}
}
}
if (class(model)[1] != "subgroup_fitted")
{
stop("model should be a fitted object returned by the 'fit.subgroup' function")
}
B <- as.integer(B[1])
if (B <= 1) stop("B must be a strictly positive integer")
train.fraction <- as.numeric(train.fraction[1])
if (train.fraction >= 1 | train.fraction <= 0)
{
stop("train.fraction must be between 0 and 1")
}
if (is.null(model$call)) stop("retcall argument must be set to TRUE for fitted model
to use validate.subgroup()")
# no need to store passed arguments for the bootstrap or training/testing replications
model$call$retcall <- FALSE
# save data objects because they
# will be written over by resampled versions later
x <- model$call$x
trt <- model$call$trt
y <- model$call$y
match.id <- model$call$match.id
propensity.func <- model$call$propensity.func
if (is.null(model$pi.x)) stop("Please re-fit model and run again")
pi.x <- model$pi.x
## override any internal parallelization
## if there is a conflict of parallelization
if (parallel & ("parallel" %in% names(model$call)) )
{
model$call$parallel <- FALSE
}
if (!all(benefit.score.quantiles < 1) || !all(benefit.score.quantiles > 0) )
{
stop("all quantile values must be strictly between 0 and 1")
}
if (is.null(benefit.score.quantiles))
{
benefit.score.quantiles <- seq(0, 1, length.out = 7)[-c(1,7)]
}
if (!is.vector(benefit.score.quantiles)) stop("benefit.score.quantiles must be a vector")
n.quantiles <- length(benefit.score.quantiles)
if (n.quantiles < 1)
{
benefit.score.quantiles <- seq(0, 1, length.out = 7)[-c(1,7)]
n.quantiles <- length(benefit.score.quantiles)
}
n.obs <- NROW(x)
# create objects to store results
boot.list <- vector(mode = "list", length = length(model$subgroup.trt.effects))
boot.list[[1]] <- array(NA, dim = c(B, length(model$subgroup.trt.effects[[1]])))
boot.list[[2]] <- boot.list[[3]] <- array(NA, dim = c(B, dim(model$subgroup.trt.effects[[2]])))
# Add extra element to boot.list to collect coefficient information on each run
boot.list[[4]] <- matrix(rep(NA,B*length(model$coefficients)),
ncol=B,
dimnames = list(row.names(model$coefficients),paste0("B",1:B)))
boot.list[[5]] <- numeric(B)
dimnames(boot.list[[2]]) <- dimnames(boot.list[[3]]) <-
c(list(NULL), dimnames(model$subgroup.trt.effects$avg.outcomes))
boot.list.quantiles <- rep(list(boot.list), n.quantiles)
sbgrp.trt.eff.mod.orig <- vector(mode = "list", length = n.quantiles)
for (q in 1:n.quantiles)
{
cutpt <- round(100 * benefit.score.quantiles[q])
cutpt.txt <- paste0("quant", cutpt)
# estimate subgroup treatment effects on original data with original model
sbgrp.trt.eff.mod.orig[[q]] <- subgroup.effects(model$benefit.scores,
y,
trt,
model$pi.x,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
}
if (parallel)
{
## no longer needed:
#comb <- function(x, ...)
#{
# lapply(seq_along(x),
# function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
#}
outlist <- foreach(i = seq(B) ) %dopar% #, .combine = "comb", .multicombine = TRUE, .init=list(list(), list(), list(), list(), list()))
{
if (method == "training_test_replication")
{
# randomly split/partition data into training and testing sets
if (is.null(match.id))
{
train.samp.size <- floor(n.obs * train.fraction)
samp.idx <- sample.int(n.obs, train.samp.size, replace = FALSE)
} else
{
# Draw at the cluster level (train.fraction interpreted as fraction of clusters)
n.levels <- length(levels(match.id))
samp.levels <- sample.int(n.levels, n.levels * train.fraction, replace = FALSE)
samp.idx <- which(match.id %in% levels(match.id)[samp.levels])
model$call$match.id <- match.id[samp.idx]
}
model$call$x <- x[samp.idx,]
model$call$trt <- trt[samp.idx]
x.test <- x[-samp.idx,]
# need to handle differently if outcome is a matrix
if (is.matrix(y))
{
model$call$y <- y[samp.idx,]
y.test <- y[-samp.idx,]
} else
{
model$call$y <- y[samp.idx]
y.test <- y[-samp.idx]
}
trt.test <- trt[-samp.idx]
pi.x.test <- pi.x[-samp.idx]
# fit subgroup model on training data
mod.b <- do.call(fit.subgroup, model$call)
# compute benefit scores on testing data
benefit.scores.test <- mod.b$predict(x.test)
# estimate subgroup treatment effects on test data
sbgrp.trt.eff.test <- subgroup.effects(benefit.scores.test,
y.test, trt.test,
pi.x.test,
model$call$cutpoint,
model$larger.outcome.better,
model$reference.trt)
# save results
res <- rep(list(list(sbgrp.trt.eff.test[[1]],
sbgrp.trt.eff.test[[2]],
sbgrp.trt.eff.test[[3]],
as.vector(mod.b$coefficients),
sbgrp.trt.eff.test[[4]])), 1 + n.quantiles)
for (q in 1:n.quantiles)
{
cutpt <- round(100 * benefit.score.quantiles[q])
cutpt.txt <- paste0("quant", cutpt)
# estimate subgroup treatment effects on test data
sbgrp.trt.eff.test.q <- subgroup.effects(benefit.scores.test,
y.test, trt.test,
pi.x.test,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
# save results
res[[1 + q]][[1]] <- sbgrp.trt.eff.test.q[[1]]
res[[1 + q]][[2]] <- sbgrp.trt.eff.test.q[[2]]
res[[1 + q]][[3]] <- sbgrp.trt.eff.test.q[[3]]
res[[1 + q]][[4]] <- as.vector(mod.b$coefficients)
res[[1 + q]][[5]] <- sbgrp.trt.eff.test.q[[4]]
}
} else if (method == "boot_bias_correction")
{ # bootstrap bias correction
# take a bootstrap sample with replacement
if (is.null(match.id))
{
samp.idx <- sample.int(n.obs, n.obs, replace = TRUE)
} else
{
# Draw at the cluster level
samp.levels <- sample(levels(match.id), replace = TRUE)
samp.lookup <- lapply(samp.levels, function(z) {which(match.id == z)})
samp.idx <- unlist(samp.lookup)
# Remap matching IDs so that each cluster draw is assigned a unique matching ID
samp.lengths <- lapply(samp.lookup,length)
model$call$match.id <- unlist(lapply(1:length(samp.lengths),function(z) {rep(z,samp.lengths[[z]])}))
}
model$call$x <- x[samp.idx,]
if (is.matrix(y))
{
model$call$y <- y[samp.idx,]
} else
{
model$call$y <- y[samp.idx]
}
model$call$trt <- trt[samp.idx]
# fit subgroup model on resampled data
mod.b <- do.call(fit.subgroup, model$call)
pi.x.b <- mod.b$pi.x
# calculate benefit scores and resulting
# subgroup treatment effects on the original data
benefit.scores.orig <- mod.b$predict(x)
sbgrp.trt.eff.orig <- subgroup.effects(benefit.scores.orig,
y, trt, model$pi.x,
model$call$cutpoint,
model$larger.outcome.better,
model$reference.trt)
# subtract estimated bias for current bootstrap iteration
res <- rep(list(list(model$subgroup.trt.effects[[1]] -
(mod.b$subgroup.trt.effects[[1]] - sbgrp.trt.eff.orig[[1]]), # bias estimate portion
model$subgroup.trt.effects[[2]] -
(mod.b$subgroup.trt.effects[[2]] - sbgrp.trt.eff.orig[[2]]), # bias estimate portion
model$subgroup.trt.effects[[3]] -
(mod.b$subgroup.trt.effects[[3]] - sbgrp.trt.eff.orig[[3]]), # bias estimate portion
as.vector(mod.b$coefficients),
model$subgroup.trt.effects[[4]] -
(mod.b$subgroup.trt.effects[[4]] - sbgrp.trt.eff.orig[[4]]))), 1 + n.quantiles)
# bias estimate portion
for (q in 1:n.quantiles)
{
cutpt <- round(100 * benefit.score.quantiles[q])
cutpt.txt <- paste0("quant", cutpt)
# estimate subgroup treatment effects on original data with resampled model
sbgrp.trt.eff.orig.q <- subgroup.effects(benefit.scores.orig,
y,
trt,
model$pi.x,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
# estimate subgroup treatment effects on resampled data with resampled model
sbgrp.trt.eff.mod <- subgroup.effects(mod.b$benefit.scores,
y[samp.idx],
trt[samp.idx],
pi.x.b,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
# save results
# subtract estimated bias for current bootstrap iteration
res[[1 + q]][[1]] <- sbgrp.trt.eff.mod.orig[[q]][[1]] -
(sbgrp.trt.eff.mod[[1]] - sbgrp.trt.eff.orig.q[[1]]) # bias estimate portion
res[[1 + q]][[2]] <- sbgrp.trt.eff.mod.orig[[q]][[2]] -
(sbgrp.trt.eff.mod[[2]] - sbgrp.trt.eff.orig.q[[2]]) # bias estimate portion
res[[1 + q]][[3]] <- sbgrp.trt.eff.mod.orig[[q]][[3]] -
(sbgrp.trt.eff.mod[[3]] - sbgrp.trt.eff.orig.q[[3]]) # bias estimate portion
res[[1 + q]][[4]] <- as.vector(mod.b$coefficients)
res[[1 + q]][[5]] <- sbgrp.trt.eff.mod.orig[[q]][[4]] -
(sbgrp.trt.eff.mod[[4]] - sbgrp.trt.eff.orig.q[[4]]) # bias estimate portion
}
}
res
} # end parallel foreach loop
## collect results as they are collected for non-parallel loop
#boot.list[[1]] <- unname(Reduce('rbind', outlist[[1]]))
for (b in 1:B)
{
boot.list[[1]][b,] <- outlist[[b]][[1]][[1]]
boot.list[[2]][b,,] <- outlist[[b]][[1]][[2]]
boot.list[[3]][b,,] <- outlist[[b]][[1]][[3]]
boot.list[[4]][,b] <- outlist[[b]][[1]][[4]]
boot.list[[5]][b] <- outlist[[b]][[1]][[5]]
for (q in 1:n.quantiles)
{
boot.list.quantiles[[q]][[1]][b,] <- outlist[[b]][[1 + q]][[1]]
boot.list.quantiles[[q]][[2]][b,,] <- outlist[[b]][[1 + q]][[2]]
boot.list.quantiles[[q]][[3]][b,,] <- outlist[[b]][[1 + q]][[3]]
boot.list.quantiles[[q]][[4]][,b] <- outlist[[b]][[1 + q]][[4]]
boot.list.quantiles[[q]][[5]][b] <- outlist[[b]][[1 + q]][[5]]
}
}
#boot.list[[4]] <- unname(Reduce('cbind', outlist[[4]]))
#boot.list[[5]] <- unname(Reduce('c', outlist[[5]]))
rm(outlist)
} else
{
for (b in 1:B)
{
if (method == "training_test_replication")
{
# randomly split/partition data into training and testing sets
if (is.null(match.id))
{
train.samp.size <- floor(n.obs * train.fraction)
samp.idx <- sample.int(n.obs, train.samp.size, replace = FALSE)
} else
{
# Draw at the cluster level (train.fraction interpreted as fraction of clusters)
n.levels <- length(levels(match.id))
samp.levels <- sample.int(n.levels, n.levels * train.fraction, replace = FALSE)
samp.idx <- which(match.id %in% levels(match.id)[samp.levels])
model$call$match.id <- match.id[samp.idx]
}
model$call$x <- x[samp.idx,]
model$call$trt <- trt[samp.idx]
x.test <- x[-samp.idx,]
# need to handle differently if outcome is a matrix
if (is.matrix(y))
{
model$call$y <- y[samp.idx,]
y.test <- y[-samp.idx,]
} else
{
model$call$y <- y[samp.idx]
y.test <- y[-samp.idx]
}
trt.test <- trt[-samp.idx]
pi.x.test <- pi.x[-samp.idx]
# fit subgroup model on training data
mod.b <- do.call(fit.subgroup, model$call)
# compute benefit scores on testing data
benefit.scores.test <- mod.b$predict(x.test)
# estimate subgroup treatment effects on test data
sbgrp.trt.eff.test <- subgroup.effects(benefit.scores.test,
y.test, trt.test,
pi.x.test,
model$call$cutpoint,
model$larger.outcome.better,
model$reference.trt)
# save results
boot.list[[1]][b,] <- sbgrp.trt.eff.test[[1]]
boot.list[[2]][b,,] <- sbgrp.trt.eff.test[[2]]
boot.list[[3]][b,,] <- sbgrp.trt.eff.test[[3]]
boot.list[[4]][,b] <- as.vector(mod.b$coefficients)
boot.list[[5]][b] <- sbgrp.trt.eff.test[[4]]
for (q in 1:n.quantiles)
{
cutpt <- round(100 * benefit.score.quantiles[q])
cutpt.txt <- paste0("quant", cutpt)
# estimate subgroup treatment effects on test data
sbgrp.trt.eff.test.q <- subgroup.effects(benefit.scores.test,
y.test, trt.test,
pi.x.test,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
# save results
boot.list.quantiles[[q]][[1]][b,] <- sbgrp.trt.eff.test.q[[1]]
boot.list.quantiles[[q]][[2]][b,,] <- sbgrp.trt.eff.test.q[[2]]
boot.list.quantiles[[q]][[3]][b,,] <- sbgrp.trt.eff.test.q[[3]]
boot.list.quantiles[[q]][[4]][,b] <- as.vector(mod.b$coefficients)
boot.list.quantiles[[q]][[5]][b] <- sbgrp.trt.eff.test.q[[4]]
}
} else if (method == "boot_bias_correction")
{ # bootstrap bias correction
# take a bootstrap sample with replacement
if (is.null(match.id))
{
samp.idx <- sample.int(n.obs, n.obs, replace = TRUE)
} else
{
# Draw at the cluster level
samp.levels <- sample(levels(match.id), replace = TRUE)
samp.lookup <- lapply(samp.levels, function(z) {which(match.id == z)})
samp.idx <- unlist(samp.lookup)
# Remap matching IDs so that each cluster draw is assigned a unique matching ID
samp.lengths <- lapply(samp.lookup,length)
model$call$match.id <- unlist(lapply(1:length(samp.lengths),function(z){rep(z,samp.lengths[[z]])}))
}
model$call$x <- x[samp.idx,]
if (is.matrix(y))
{
model$call$y <- y[samp.idx,]
} else
{
model$call$y <- y[samp.idx]
}
model$call$trt <- trt[samp.idx]
# fit subgroup model on resampled data
mod.b <- do.call(fit.subgroup, model$call)
pi.x.b <- mod.b$pi.x
# calculate benefit scores and resulting
# subgroup treatment effects on the original data
benefit.scores.orig <- mod.b$predict(x)
sbgrp.trt.eff.orig <- subgroup.effects(benefit.scores.orig,
y, trt, model$pi.x,
model$call$cutpoint,
model$larger.outcome.better,
model$reference.trt)
# subtract estimated bias for current bootstrap iteration
boot.list[[1]][b,] <- model$subgroup.trt.effects[[1]] -
(mod.b$subgroup.trt.effects[[1]] - sbgrp.trt.eff.orig[[1]]) # bias estimate portion
boot.list[[2]][b,,] <- model$subgroup.trt.effects[[2]] -
(mod.b$subgroup.trt.effects[[2]] - sbgrp.trt.eff.orig[[2]]) # bias estimate portion
boot.list[[3]][b,,] <- model$subgroup.trt.effects[[3]] -
(mod.b$subgroup.trt.effects[[3]] - sbgrp.trt.eff.orig[[3]]) # bias estimate portion
boot.list[[4]][,b] <- as.vector(mod.b$coefficients)
boot.list[[5]][b] <- model$subgroup.trt.effects[[4]] -
(mod.b$subgroup.trt.effects[[4]] - sbgrp.trt.eff.orig[[4]]) # bias estimate portion
for (q in 1:n.quantiles)
{
cutpt <- round(100 * benefit.score.quantiles[q])
cutpt.txt <- paste0("quant", cutpt)
# estimate subgroup treatment effects on original data with resampled model
sbgrp.trt.eff.orig <- subgroup.effects(benefit.scores.orig,
y, trt, model$pi.x,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
# estimate subgroup treatment effects on resampled data with resampled model
sbgrp.trt.eff.mod <- subgroup.effects(mod.b$benefit.scores,
y[samp.idx],
trt[samp.idx],
pi.x.b,
cutpt.txt,
model$larger.outcome.better,
model$reference.trt)
# save results
# subtract estimated bias for current bootstrap iteration
boot.list.quantiles[[q]][[1]][b,] <- sbgrp.trt.eff.mod.orig[[q]][[1]] -
(sbgrp.trt.eff.mod[[1]] - sbgrp.trt.eff.orig[[1]]) # bias estimate portion
boot.list.quantiles[[q]][[2]][b,,] <- sbgrp.trt.eff.mod.orig[[q]][[2]] -
(sbgrp.trt.eff.mod[[2]] - sbgrp.trt.eff.orig[[2]]) # bias estimate portion
boot.list.quantiles[[q]][[3]][b,,] <- sbgrp.trt.eff.mod.orig[[q]][[3]] -
(sbgrp.trt.eff.mod[[3]] - sbgrp.trt.eff.orig[[3]]) # bias estimate portion
boot.list.quantiles[[q]][[4]][,b] <- as.vector(mod.b$coefficients)
boot.list.quantiles[[q]][[5]][b] <- sbgrp.trt.eff.mod.orig[[q]][[4]] -
(sbgrp.trt.eff.mod[[4]] - sbgrp.trt.eff.orig[[4]]) # bias estimate portion
}
}
} ## end resampling loop
}
# ugly way to handle cases where
# some subgroups have no members
#for (l in 1:length(boot.list))
#{
# boot.list[[l]][is.nan(boot.list[[l]])] <- 0
#}
# compute averages and standard
# deviations across iterations
summary.stats <- list(colMeans(boot.list[[1]], na.rm = TRUE),
apply(boot.list[[2]], c(2, 3), function(x) mean(x, na.rm = TRUE)),
apply(boot.list[[3]], c(2, 3), function(x) mean(x, na.rm = TRUE)),
mean(boot.list[[5]], na.rm = TRUE))
summary.stats.se <- list(apply(boot.list[[1]], 2, function(x) sd(x, na.rm = TRUE)),
apply(boot.list[[2]], c(2, 3), function(x) sd(x, na.rm = TRUE)),
apply(boot.list[[3]], c(2, 3), function(x) sd(x, na.rm = TRUE)),
sd(boot.list[[5]], na.rm = TRUE))
names(summary.stats) <- names(model$subgroup.trt.effects)
names(boot.list) <- 1:length(names(boot.list))
names(boot.list)[c(1:3, 5)] <- names(summary.stats)
names(boot.list)[4] <- "coefficients"
colnames(boot.list[[1]]) <- names(model$subgroup.trt.effects$subgroup.effects)
summary.stats.quantile <- summary.stats.quantile.se <- vector(mode = "list", length = n.quantiles)
for (q in 1:n.quantiles)
{
summary.stats.quantile[[q]] <- list(colMeans(boot.list.quantiles[[q]][[1]], na.rm = TRUE),
apply(boot.list.quantiles[[q]][[2]], c(2, 3), function(x) mean(x, na.rm = TRUE)),
apply(boot.list.quantiles[[q]][[3]], c(2, 3), function(x) mean(x, na.rm = TRUE)),
mean(boot.list.quantiles[[q]][[5]], na.rm = TRUE))
summary.stats.quantile.se[[q]] <- list(apply(boot.list.quantiles[[q]][[1]], 2, function(x) sd(x, na.rm = TRUE)),
apply(boot.list.quantiles[[q]][[2]], c(2, 3), function(x) sd(x, na.rm = TRUE)),
apply(boot.list.quantiles[[q]][[3]], c(2, 3), function(x) sd(x, na.rm = TRUE)),
sd(boot.list.quantiles[[q]][[5]], na.rm = TRUE))
names(summary.stats.quantile[[q]]) <- names(model$subgroup.trt.effects)
names(summary.stats.quantile[[q]]$subgroup.effects) <- names(model$subgroup.trt.effects$subgroup.effects)
dimnames(summary.stats.quantile[[q]]$sample.sizes) <- dimnames(model$subgroup.trt.effects$sample.sizes)
names(summary.stats.quantile.se[[q]][[1]]) <- names(model$subgroup.trt.effects$subgroup.effects)
dimnames(summary.stats.quantile.se[[q]][[3]]) <- dimnames(model$subgroup.trt.effects$sample.sizes)
names(summary.stats.quantile.se[[q]]) <- paste("SE", names(summary.stats), sep = ".")
names(boot.list.quantiles[[q]]) <- 1:length(names(boot.list))
names(boot.list.quantiles[[q]])[c(1:3, 5)] <- names(summary.stats)
names(boot.list.quantiles[[q]])[4] <- "coefficients"
colnames(boot.list.quantiles[[q]][[1]]) <- names(model$subgroup.trt.effects$subgroup.effects)
}
names(summary.stats.quantile) <- names(summary.stats.quantile.se) <-
names(boot.list.quantiles) <- paste0("Quant_", round(100 * benefit.score.quantiles))
names(summary.stats$subgroup.effects) <- names(model$subgroup.trt.effects$subgroup.effects)
dimnames(summary.stats$sample.sizes) <- dimnames(model$subgroup.trt.effects$sample.sizes)
names(summary.stats.se[[1]]) <- names(model$subgroup.trt.effects$subgroup.effects)
dimnames(summary.stats.se[[3]]) <- dimnames(model$subgroup.trt.effects$sample.sizes)
names(summary.stats.se) <- paste("SE", names(summary.stats), sep = ".")
names(summary.stats.quantile) <- names(summary.stats.quantile.se) <- paste0("Quantile_", round(100 * benefit.score.quantiles))
ret <- list(avg.results = summary.stats, # means
se.results = summary.stats.se, # std errors
boot.results = boot.list, # this is a list of results for each iter
avg.quantile.results = summary.stats.quantile,
se.quantile.results = summary.stats.quantile.se,
boot.results.quantiles = boot.list.quantiles,
family = model$family, # model family
loss = model$loss, # model loss
method = model$method, # subgroup method (weighting vs a-learning)
n.trts = model$n.trts,
comparison.trts = model$comparison.trts,
reference.trt = model$reference.trt,
trts = model$trts,
larger.outcome.better = model$larger.outcome.better,
cutpoint = model$cutpoint,
val.method = method,
iterations = B,
nobs = n.obs,
nvars = NCOL(x))
class(ret) <- "subgroup_validated"
ret
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.