#' Create a delta `data.frame` template
#'
#' @description
#' Creates a `data.frame` in the format required by [analyse()] for the use
#' of applying a delta adjustment.
#'
#' @details
#' To apply a delta adjustment the [analyse()] function expects
#' a delta `data.frame` with 3 variables: `vars$subjid`, `vars$visit` and `delta`
#' (where `vars` is the object supplied in the original call to [draws()]
#' as created by the [set_vars()] function).
#'
#' This function will return a `data.frame` with the aforementioned variables with one
#' row per subject per visit. If the `delta` argument to this function is `NULL`
#' then the `delta` column in the returned `data.frame` will be 0 for all observations.
#' If the `delta` argument is not `NULL` then `delta` will be calculated separately
#' for each subject as the accumulative sum of `delta` multiplied by the scaling
#' coefficient `dlag` based upon how many visits after the subject's intercurrent
#' event (ICE) the visit in question is.
#' This is best illustrated with an example:
#'
#' Let `delta = c(5,6,7,8)` and `dlag=c(1,2,3,4)` (i.e. assuming there are 4 visits)
#' and lets say that the subject had an ICE on visit 2. The calculation would then be
#' as follows:
#'
#' ```
#' v1 v2 v3 v4
#' --------------
#' 5 6 7 8 # delta assigned to each visit
#' 0 1 2 3 # lagged scaling starting from the first visit after the subjects ICE
#' --------------
#' 0 6 14 24 # delta * lagged scaling
#' --------------
#' 0 6 20 44 # accumulative sum of delta to be applied to each visit
#' ```
#'
#' That is to say the subject would have a delta offset of 0 applied for visit-1, 6
#' for visit-2, 20 for visit-3 and 44 for visit-4. As a comparison, lets say that the
#' subject instead had their ICE on visit 3, the calculation would then be as follows:
#'
#' ```
#' v1 v2 v3 v4
#' --------------
#' 5 6 7 8 # delta assigned to each visit
#' 0 0 1 2 # lagged scaling starting from the first visit after the subjects ICE
#' --------------
#' 0 0 7 16 # delta * lagged scaling
#' --------------
#' 0 0 7 23 # accumulative sum of delta to be applied to each visit
#' ```
#'
#' In terms of practical usage, lets say that you wanted a delta of 5 to be used for all post ICE visits
#' regardless of their proximity to the ICE visit. This can be achieved by setting
#' `delta = c(5,5,5,5)` and `dlag = c(1,0,0,0)`. For example lets say a subject had their
#' ICE on visit-1, then the calculation would be as follows:
#'
#' ```
#' v1 v2 v3 v4
#' --------------
#' 5 5 5 5 # delta assigned to each visit
#' 1 0 0 0 # lagged scaling starting from the first visit after the subjects ICE
#' --------------
#' 5 0 0 0 # delta * lagged scaling
#' --------------
#' 5 5 5 5 # accumulative sum of delta to be applied to each visit
#' ```
#'
#' Another way of using these arguments
#' is to set `delta` to be the difference in time between visits and `dlag` to be the
#' amount of delta per unit of time. For example lets say that we have a visit on weeks
#' 1, 5, 6 & 9 and that we want a delta of 3 to be applied for each week after an ICE. This
#' can be achieved by setting `delta = c(0,4,1,3)` (the difference in weeks between each visit)
#' and `dlag = c(3, 3, 3, 3)`. For example lets say we have a subject who had their ICE on week-5
#' (i.e. visit-2) then the calculation would be:
#'
#' ```
#' v1 v2 v3 v4
#' --------------
#' 0 4 1 3 # delta assigned to each visit
#' 0 0 3 3 # lagged scaling starting from the first visit after the subjects ICE
#' --------------
#' 0 0 3 9 # delta * lagged scaling
#' --------------
#' 0 0 3 12 # accumulative sum of delta to be applied to each visit
#' ```
#'
#' i.e. on week-6 (1 week after the ICE) they have a delta of 3 and on week-9 (4 weeks after the ICE)
#' they have a delta of 12.
#'
#'
#' Please note that this function also returns several utility variables so that
#' the user can create their own custom logic for defining what `delta`
#' should be set to. These additional variables include:
#'
#' - `is_mar` - If the observation was missing would it be regarded as MAR? This variable
#' is set to `FALSE` for observations that occurred after a non-MAR ICE, otherwise it is set to `TRUE`.
#' - `is_missing` - Is the outcome variable for this observation missing.
#' - `is_post_ice` - Does the observation occur after the patient's ICE as defined by the
#' `data_ice` dataset supplied to [draws()].
#' - `strategy` - What imputation strategy was assigned to for this subject.
#'
#'
#' The design and implementation of this function is largely based upon the same functionality
#' as implemented in the so called "five marcos" by James Roger. See Roger (2021).
#'
#' @references
#' Roger, James. Reference-based mi via multivariate normal rm (the “five macros” and miwithd), 2021. URL
#' https://www.lshtm.ac.uk/research/centres-projects-groups/missing-data#dia-missing-data.
#'
#'
#' @param imputations an `imputation` object as created by [impute()].
#'
#' @param delta `NULL` or a numeric vector. Determines the baseline amount of delta
#' to be applied to each visit. See details. If a numeric vector it must have
#' the same length as the number of unique visits in the original dataset.
#'
#' @param dlag `NULL` or a numeric vector. Determines the scaling to be applied
#' to `delta` based upon which visit the ICE occurred on. See details. If a
#' numeric vector it must have the same length as the number of unique visits in
#' the original dataset.
#'
#' @param missing_only Logical, if `TRUE` then non-missing post-ICE data will have a delta value
#' of 0 assigned. Note that the calculation (as described in the details section) is performed
#' first and then overwritten with 0's at the end (i.e. the delta values for missing
#' post-ICE visits will stay the same regardless of this option).
#'
#' @examples
#' \dontrun{
#' delta_template(imputeObj)
#' delta_template(imputeObj, delta = c(5,6,7,8), dlag = c(1,2,3,4))
#' }
#' @seealso [analyse()]
#' @export
delta_template <- function(
imputations,
delta = NULL,
dlag = NULL,
missing_only = TRUE
) {
dat <- get_delta_template(imputations)
if (is.null(delta)) return(dat)
ld <- imputations$data
assert_that(
is.numeric(delta),
is.numeric(dlag),
length(delta) == length(dlag),
length(delta) == length(ld$visits),
msg = sprintf(
"`delta` and `dlag` must both be a length %s numeric vector",
length(ld$visits)
)
)
assert_that(
is.logical(missing_only),
length(missing_only) == 1,
msg = "`missing_only` must be TRUE or FALSE"
)
delta_list <- lapply(
ld$ids,
function(id) {
d_lagscale(
delta = delta,
dlag = dlag,
is_post_ice = ld$is_post_ice[[id]]
)
}
)
delta <- unlist(delta_list)
assert_that(length(delta) == length(dat[["delta"]]))
dat["delta"] <- delta
if (missing_only) {
## Correction to remove any delta assigned to none missing values
dat$delta[!dat[["is_missing"]] & dat[["is_post_ice"]]] <- 0
}
return(dat)
}
#' Get delta utility variables
#'
#' This function creates the default delta template (1 row per subject per visit)
#' and extracts all the utility information that users need to define their own logic
#' for defining delta. See [delta_template()] for full details.
#'
#' @param imputations an imputations object created by [impute()].
get_delta_template <- function(imputations) {
ld <- imputations$data
x <- lapply(
ld$ids,
function(id) {
strat <- ifelse(ld$is_mar[[id]], "MAR", ld$strategies[[id]])
strat[!ld$is_missing[[id]]] <- NA_character_
df <- data.frame(
"is_mar" = ld$is_mar[[id]],
"is_missing" = ld$is_missing[[id]],
"is_post_ice" = ld$is_post_ice[[id]],
"strategy" = strat,
"delta" = 0,
stringsAsFactors = FALSE
)
df[[ld$vars$subjid]] <- factor(id, levels = ld$ids)
df[[ld$vars$visit]] <- factor(ld$visits, levels = ld$visits)
df[[ld$vars$group]] <- ld$group[[id]]
vars <- c(
ld$vars$subjid,
ld$vars$visit,
ld$vars$group,
"is_mar",
"is_missing",
"is_post_ice",
"strategy",
"delta"
)
return(df[, vars])
}
)
Reduce(rbind, x)
}
#' Calculate delta from a lagged scale coefficient
#'
#' @description
#' Calculates a delta value based upon a baseline delta value and a
#' post ICE scaling coefficient.
#'
#' @param delta a numeric vector. Determines the baseline amount of delta
#' to be applied to each visit.
#'
#' @param dlag a numeric vector. Determines the scaling to be applied
#' to `delta` based upon with visit the ICE occurred on. Must be the same
#' length as delta.
#'
#' @param is_post_ice logical vector. Indicates whether a visit is "post-ICE" or
#' not.
#'
#'
#' @details
#' See [delta_template()] for full details on how this calculation is performed.
d_lagscale <- function(delta, dlag, is_post_ice) {
assert_that(
is.numeric(dlag),
is.numeric(delta),
is.logical(is_post_ice),
msg = "`dlag` and `delta` must be numeric, `is_post_ice` must be logical"
)
assert_that(
length(delta) == length(dlag),
length(delta) == length(is_post_ice),
msg = "`delta`, `dlag` and `is_post_ice` must all be the same length"
)
result <- rep(0, length(dlag))
if (all(!is_post_ice)) {
return(result)
}
is_post_ice_scale <- delta[is_post_ice]
is_post_ice_lag <- dlag[seq_len(length(is_post_ice_scale))]
vals <- cumsum(is_post_ice_lag * is_post_ice_scale)
result[is_post_ice] <- vals
return(result)
}
#' Applies delta adjustment
#'
#' @description
#' Takes a delta dataset and adjusts the outcome variable by adding the
#' corresponding delta.
#'
#' @param data `data.frame` which will have its `outcome` column adjusted.
#' @param delta `data.frame` (must contain a column called `delta`).
#' @param group character vector of variables in both `data` and `delta` that will be used
#' to merge the 2 data.frames together by.
#' @param outcome character, name of the outcome variable in `data`.
apply_delta <- function(data, delta = NULL, group = NULL, outcome = NULL) {
assert_that(
is.character(group),
length(group) >= 1,
is.character(outcome),
length(outcome) == 1,
msg = "`group` and `outcome` must be character vectors"
)
assert_that(
is.data.frame(data),
is.data.frame(delta) | is.null(delta),
msg = "`dat` and `delta` must be data.frames"
)
assert_that(
!"delta" %in% names(data),
msg = " `delta` is a reserved variable name should not be already defined in `data`"
)
if (is.null(delta)) {
return(data)
}
if (nrow(delta) == 0) {
return(data)
}
for (var in c(group, outcome)) {
assert_that(
var %in% names(data),
msg = sprintf("Variable `%s` is not in `data`", var)
)
}
for (var in c(group, "delta")) {
assert_that(
var %in% names(delta),
msg = sprintf("Variable `%s` is not in `delta`", var)
)
}
delta_min <- delta[, c(group, "delta")]
# We insert a variable in order to recover the original sort order of our `data.frame`
# We use a obfuscated name in order to prevent variable overwriting
sort_var <- "sort_variable_awdji394thgmngrq2rqb2"
data[[sort_var]] <- seq_len(nrow(data))
data2 <- merge(
data,
delta_min,
all.x = TRUE,
by = group
)
# Restore sort order and remove variable
data2 <- data2[order(data2[[sort_var]]), ]
data[[sort_var]] <- NULL # As we rely on the names of data later to reduce data2
data2[is.na(data2[["delta"]]), "delta"] <- 0
data2[[outcome]] <- data2[[outcome]] + data2[["delta"]]
data3 <- data2[, names(data)]
assert_that(
ncol(data3) == ncol(data),
nrow(data3) == nrow(data),
all(names(data3) == names(data)),
msg = paste0(
"Data structure has been altered whilst applying delta. ",
"This is most likely caused by having duplicate rows per id within the delta dataset"
)
)
class(data3) <- class(data)
return(data3)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.