R/impute.R

Defines functions print.imputation validate.imputation as_imputation validate_strategies validate.references get_conditional_parameters impute_outcome get_visit_distribution_parameters impute_data_individual invert_indexes transpose_samples split_imputations convert_to_imputation_list_df impute_internal impute.condmean impute.random impute

Documented in as_imputation convert_to_imputation_list_df get_conditional_parameters get_visit_distribution_parameters impute impute.condmean impute_data_individual impute_internal impute_outcome impute.random invert_indexes print.imputation split_imputations transpose_samples validate.references validate_strategies

#' Create imputed datasets
#'
#' `impute()` creates imputed datasets based upon the data and options specified in
#' the call to [draws()]. One imputed dataset is created per each "sample" created by
#' [draws()].
#'
#' @name impute
#'
#' @param draws A `draws` object created by [draws()].
#'
#' @param references A named vector. Identifies the references to be used for reference-based
#' imputation methods. Should be of the form `c("Group1" = "Reference1", "Group2" = "Reference2")`.
#' If `NULL` (default), the references are assumed to be of the form
#' `c("Group1" = "Group1", "Group2" = "Group2")`. This argument cannot be `NULL` if
#' an imputation strategy (as defined by `data_ice[[vars$strategy]]` in the call to [draws]) other than `MAR` is set.
#'
#' @param update_strategy An optional `data.frame`. Updates the imputation method that was
#' originally set via the `data_ice` option in [draws()]. See the details section for more
#' information.
#'
#' @param strategies A named list of functions. Defines the imputation functions to be used.
#' The names of the list should mirror the values specified in `strategy` column of `data_ice`.
#' Default = [getStrategies()]. See [getStrategies()] for more details.
#'
#' @details
#'
#' `impute()` uses the imputation model parameter estimates, as generated by [draws()], to
#' first calculate the marginal (multivariate normal) distribution of a subject's longitudinal
#' outcome variable
#' depending on their covariate values.
#' For subjects with intercurrent events (ICEs) handled using non-MAR methods, this marginal distribution
#' is then updated depending on the time of the first visit affected by the ICE,
#' the chosen imputation strategy and the chosen reference group as described in Carpenter, Roger, and Kenward (2013) .
#' The subject's imputation distribution used for imputing missing values is then defined as
#' their marginal distribution conditional on their observed outcome values.
#' One dataset is being generated per set of parameter estimates provided by [draws()].
#'
#' The exact manner in how missing values are imputed from this conditional imputation distribution depends
#' on the `method` object that was provided to [draws()], in particular:
#'
#' - Bayes & Approximate Bayes: each imputed dataset contains 1 row per subject & visit
#' from the original dataset with missing values imputed by taking a single random sample
#' from the conditional imputation distribution.
#'
#' - Conditional Mean: each imputed dataset contains 1 row per subject & visit from the
#' bootstrapped or jackknife dataset that was used to generate the corresponding parameter
#' estimates in [draws()]. Missing values are imputed by using the mean of the conditional
#' imputation distribution. Please note that the first imputed dataset refers to the conditional
#' mean imputation on the original dataset whereas all subsequent imputed datasets refer to
#' conditional mean imputations for bootstrap or jackknife samples, respectively, of the original data.
#'
#' - Bootstrapped Maximum Likelihood MI (BMLMI): it performs `D` random imputations of each bootstrapped
#' dataset that was used to generate the corresponding parameter estimates in [draws()]. A total number of
#' `B*D` imputed datasets is provided, where `B` is the number of bootstrapped datasets. Missing values
#' are imputed by taking a random sample from the conditional imputation distribution.
#'
#' The `update_strategy` argument can be used to update the imputation strategy that was
#' originally set via the `data_ice` option in [draws()]. This avoids having to re-run the [draws()]
#' function when changing the imputation strategy in certain circumstances (as detailed below).
#' The `data.frame` provided to `update_strategy` argument must contain two columns,
#' one for the subject ID and another for the imputation strategy, whose names are the same as
#' those defined in the `vars` argument as specified in the call to [draws()]. Please note that this
#' argument only allows you to update the imputation strategy and not other arguments such as the
#' time of the first visit affected by the ICE.
#' A key limitation of this functionality is
#' that one can only switch between a MAR and a non-MAR strategy (or vice versa) for subjects without
#' observed post-ICE data. The reason for this is that such a change would affect whether the post-ICE data is included
#' in the base imputation model or not (as explained in the help to [draws()]).
#' As an example, if a subject had their ICE on "Visit 2"
#' but had observed/known values for "Visit 3" then the function will throw an error
#' if one tries to switch the strategy from MAR to a non-MAR strategy. In contrast, switching from
#' a non-MAR to a MAR strategy, whilst valid, will raise a warning as not all usable data
#' will have been utilised in the imputation model.
#'
#' @references
#'
#' James R Carpenter, James H Roger, and Michael G Kenward. Analysis of longitudinal trials with protocol deviation:
#' a framework for relevant,
#' accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics,
#' 23(6):1352–1371, 2013. \[Section 4.2 and 4.3\]
#'
#' @examples
#' \dontrun{
#'
#' impute(
#'     draws = drawobj,
#'     references = c("Trt" = "Placebo", "Placebo" = "Placebo")
#' )
#'
#' new_strategy <- data.frame(
#'   subjid = c("Pt1", "Pt2"),
#'   strategy = c("MAR", "JR")
#' )
#'
#' impute(
#'     draws = drawobj,
#'     references = c("Trt" = "Placebo", "Placebo" = "Placebo"),
#'     update_strategy = new_strategy
#' )
#' }
#'
#' @export
impute <- function(draws, references = NULL, update_strategy = NULL, strategies = getStrategies()) {
    UseMethod("impute")
}



#' @rdname impute
#' @export
impute.random <- function(draws, references = NULL, update_strategy = NULL, strategies = getStrategies()) {
    result <- impute_internal(
        draws = draws,
        update_strategy = update_strategy,
        references = references,
        strategies = strategies,
        condmean = FALSE
    )
    return(as_class(result, "imputation"))
}




#' @rdname impute
#' @export
impute.condmean <- function(draws, references = NULL, update_strategy = NULL, strategies = getStrategies()) {
    result <- impute_internal(
        draws = draws,
        update_strategy = update_strategy,
        references = references,
        strategies = strategies,
        condmean = TRUE
    )
    return(as_class(result, "imputation"))
}



#'  Create imputed datasets
#'
#' This is the work horse function that implements most of the functionality of impute.
#' See the user level function [impute()] for further details.
#'
#' @inheritParams impute
#'
#' @param condmean logical. If TRUE will impute using the conditional mean values, if values
#' will impute by taking a random draw from the multivariate normal distribution.
impute_internal <- function(draws, references = NULL, update_strategy, strategies, condmean) {
    validate(draws)

    data <- draws$data$clone(deep = TRUE)

    validate_strategies(strategies, data$strategies)
    if (!is.null(update_strategy)) {
        data$update_strategies(update_strategy)
    }

    if(is.null(references)) {
        assert_that(
            all(data$strategies == "MAR"),
            msg = "You have set a non-MAR imputation strategy. Please specify the references using the argument `references`"
        )
        references <- levels(data$data[[data$vars$group]])
        names(references) <- references
    }
    references <- add_class(references, "references")
    validate(references, data$data[[data$vars$group]])


    validate(draws$samples)
    samples_grouped <- transpose_samples(draws$samples)

    n_imputations <- ifelse(is.null(draws$method$D), 1, draws$method$D)

    list_of_imputations_pt <- mapply(
        impute_data_individual,
        names(samples_grouped$index),
        samples_grouped$index,
        MoreArgs = list(
            beta = samples_grouped$beta,
            sigma = samples_grouped$sigma,
            data = data,
            references = references,
            strategies = strategies,
            condmean = condmean,
            n_imputations = n_imputations
        ),
        SIMPLIFY = FALSE
    )

    list_of_imputation_dfs <- convert_to_imputation_list_df(
        list_of_imputations_pt,
        lapply(draws$samples, `[[`, "ids")
    )

    x <- as_imputation(
        imputations = list_of_imputation_dfs,
        data = data,
        method = draws$method,
        references = references
    )
    validate(x)

    return(x)
}




#' Convert list of [imputation_list_single()] objects to an [imputation_list_df()] object
#' (i.e. a list of [imputation_df()] objects's)
#'
#' @param imputes a list of [imputation_list_single()] objects
#' @param sample_ids A list with 1 element per required imputation_df. Each element
#' must contain a vector of "ID"'s which correspond to the [imputation_single()] ID's
#' that are required for that dataset. The total number of ID's must by equal to the
#' total number of rows within all of `imputes$imputations`
#'
#' To accommodate for `method_bmlmi()` the [impute_data_individual()] function returns
#' a list of [imputation_list_single()] objects with 1 object per each subject.
#'
#' [imputation_list_single()] stores the subjects imputations as a matrix where the columns
#' of the matrix correspond to the D of [method_bmlmi()]. Note that all other methods
#' (i.e. `methods_*()`) are a special case of this with D = 1. The number of rows in the
#' matrix varies for each subject and is equal to the number of times the patient was selected
#' for imputation (for non-conditional mean methods this should be 1 per subject per imputed
#' dataset).
#'
#' This function is best illustrated by an example:
#'
#' ```
#' imputes = list(
#'     imputation_list_single(
#'         id = "Tom",
#'         imputations = matrix(
#'              imputation_single_t_1_1,  imputation_single_t_1_2,
#'              imputation_single_t_2_1,  imputation_single_t_2_2,
#'              imputation_single_t_3_1,  imputation_single_t_3_2
#'         )
#'     ),
#'     imputation_list_single(
#'         id = "Tom",
#'         imputations = matrix(
#'              imputation_single_h_1_1,  imputation_single_h_1_2,
#'         )
#'     )
#' )
#'
#' sample_ids <- list(
#'     c("Tom", "Harry", "Tom"),
#'     c("Tom")
#' )
#' ```
#'
#' Then `convert_to_imputation_df(imputes, sample_ids)` would result in:
#'
#' ```
#' imputation_list_df(
#'     imputation_df(
#'         imputation_single_t_1_1,
#'         imputation_single_h_1_1,
#'         imputation_single_t_2_1
#'     ),
#'     imputation_df(
#'         imputation_single_t_1_2,
#'         imputation_single_h_1_2,
#'         imputation_single_t_2_2
#'     ),
#'     imputation_df(
#'         imputation_single_t_3_1
#'     ),
#'     imputation_df(
#'         imputation_single_t_3_2
#'     )
#' )
#' ```
#'
#' Note that the different repetitions (i.e. the value set for D) are grouped together
#' sequentially.
#'
convert_to_imputation_list_df <- function(imputes, sample_ids) {
    D_by_id <- vapply(imputes, function(x) ncol(x$imputations), numeric(1))
    n_imputations_by_id <- vapply(imputes, function(x) nrow(x$imputations), numeric(1))

    D <- unique(D_by_id)
    B <- length(sample_ids)

    assert_that(
        length(D) == 1,
        msg = "D could not be uniquely determined"
    )
    assert_that(
        sum(n_imputations_by_id) == length(unlist(sample_ids)),
        msg = "Number of samples available does not equal the number required"
    )

    impute_dfs_by_d <- lapply(
        seq_len(D),
        function(d) {
            imputations_d <- lapply(imputes, function(x) x$imputations[, d])
            list_of_singles <- unlist(imputations_d, recursive = FALSE, use.names = FALSE)
            split_imputations(list_of_singles, sample_ids)
        }
    )

    impute_dfs_flat_by_d <- unlist(impute_dfs_by_d, recursive = FALSE, use.names = FALSE)

    index_d2b <- c()
    for (b in seq_len(B)) index_d2b <- c(index_d2b, b + (seq_len(D) - 1) * B)

    impute_dfs_flat_by_b <- do.call(imputation_list_df, impute_dfs_flat_by_d[index_d2b])
    return(impute_dfs_flat_by_b)
}



#' Split a flat list of [imputation_single()] into multiple [imputation_df()]'s by ID
#'
#' @param list_of_singles A list of [imputation_single()]'s
#'
#' @param split_ids A list with 1 element per required split. Each element
#' must contain a vector of "ID"'s which correspond to the [imputation_single()] ID's
#' that are required within that sample. The total number of ID's must by equal to the
#' length of `list_of_singles`
#'
#' @importFrom utils relist
#'
#' @details This function converts a list of imputations from being structured per patient
#' to being structured per sample i.e. it converts
#'
#' ```
#' obj <- list(
#'     imputation_single("Ben", numeric(0)),
#'     imputation_single("Ben", numeric(0)),
#'     imputation_single("Ben", numeric(0)),
#'     imputation_single("Harry", c(1, 2)),
#'     imputation_single("Phil", c(3, 4)),
#'     imputation_single("Phil", c(5, 6)),
#'     imputation_single("Tom", c(7, 8, 9))
#' )
#'
#' index <- list(
#'     c("Ben", "Harry", "Phil", "Tom"),
#'     c("Ben", "Ben", "Phil")
#' )
#' ```
#'
#' Into:
#'
#' ```
#' output <- list(
#'     imputation_df(
#'         imputation_single(id = "Ben", values = numeric(0)),
#'         imputation_single(id = "Harry", values = c(1, 2)),
#'         imputation_single(id = "Phil", values = c(3, 4)),
#'         imputation_single(id = "Tom", values = c(7, 8, 9))
#'     ),
#'     imputation_df(
#'         imputation_single(id = "Ben", values = numeric(0)),
#'         imputation_single(id = "Ben", values = numeric(0)),
#'         imputation_single(id = "Phil", values = c(5, 6))
#'     )
#' )
#' ```
split_imputations <- function(list_of_singles, split_ids) {
    ids_flat <- vapply(list_of_singles, `[[`, character(1), "id")
    ids_order_index <- order(ids_flat)
    index_flat <- unlist(split_ids)
    reindex <- order(order(index_flat))
    assert_that(
        length(reindex) == length(ids_flat),
        identical(
            tapply(index_flat, index_flat, length),
            tapply(ids_flat, ids_flat, length),

        ),
        msg = "index is not compatible with the object"
    )
    output <- list_of_singles[ids_order_index][reindex]
    output_list <- relist(output, split_ids)
    lapply(output_list, imputation_df)
}





#' Transpose samples
#'
#' Transposes samples generated by [draws()] so that they are grouped
#' by `subjid` instead of by sample number.
#'
#' @param samples A list of samples generated by [draws()].
transpose_samples <- function(samples) {

    beta <- list()
    sigma <- list()

    grp_names <- names(samples[[1]]$sigma)
    for (grp in grp_names) {
        sigma[[grp]] <-  vector(mode = "list", length = length(samples))
    }

    for (i in seq_along(samples)) {
        sample <- samples[[i]]
        beta[[i]] <- sample$beta
        for (grp in grp_names) sigma[[grp]][[i]] <-  sample$sigma[[grp]]
    }

    index <- invert_indexes(lapply(samples, function(x) x$ids))

    x <- list(
        beta = beta,
        sigma = sigma,
        index = index
    )
    return(x)
}




#' Invert and derive indexes
#'
#' Takes a list of elements and creates a new list
#' containing 1 entry per unique element value containing
#' the indexes of which original elements it occurred in.
#'
#' @details
#' This functions purpose is best illustrated by an example:
#'
#' input:
#'
#' ```
#' list( c("A", "B", "C"), c("A", "A", "B"))}
#' ```
#' becomes:
#'
#' ```
#' list( "A" = c(1,2,2), "B" = c(1,2), "C" = 1 )
#' ```
#'
#' @param x list of elements to invert and calculate index from (see details).
invert_indexes <- function(x) {
    lens <- vapply(x, function(x) length(x), numeric(1))
    grp <- rep(seq_along(x), lens)
    vals <- unlist(x, use.names = FALSE)
    uvals <- unique(vals)
    index <- split(grp, vals)[uvals]
    return(index)
}


#' Impute data for a single subject
#'
#' This function performs the imputation for a single subject at a time implementing the
#' process as detailed in [impute()].
#'
#' Note that this function performs all of the required imputations for a subject at the
#' same time. I.e. if a subject is included in samples  1,3,5,9 then all imputations (using
#' sample-dependent imputation model parameters) are performed in one step in order to avoid
#' having to look up a subjects's covariates and expanding them to a design matrix multiple times
#' (which would be more computationally expensive).
#' The function also supports subject belonging to the same sample multiple times,
#' i.e. 1,1,2,3,5,5, as will typically occur for bootstrapped datasets.
#'
#' @param id Character string identifying the subject.
#'
#' @param index The sample indexes which the subject belongs to e.g `c(1,1,1,2,2,4)`.
#'
#' @param beta A list of beta coefficients for each sample, i.e. `beta[[1]]` is the set of
#' beta coefficients for the first sample.
#'
#' @param sigma A list of the sigma coefficients for each sample split by group i.e.
#' `sigma[[1]][["A"]]` would give the sigma coefficients for group A for the first sample.
#'
#' @param data A `longdata` object created by [longDataConstructor()]
#'
#' @param references A named vector. Identifies the references to be used when generating the
#' imputed values. Should be of the form `c("Group" = "Reference", "Group" = "Reference")`.
#'
#' @param strategies A named list of functions. Defines the imputation functions to be used.
#' The names of the list should mirror the values specified in `method` column of `data_ice`.
#' Default = `getStrategies()`. See [getStrategies()] for more details.
#'
#' @param condmean Logical. If `TRUE` will impute using the conditional mean values, if `FALSE`
#' will impute by taking a random draw from the multivariate normal distribution.
#'
#' @param n_imputations When `condmean = FALSE` numeric representing the number of random imputations to be performed for each sample.
#' Default is `1` (one random imputation per sample).
impute_data_individual <- function(
    id,
    index,
    beta,
    sigma,
    data,
    references,
    strategies,
    condmean,
    n_imputations = 1
) {

    # Define default return value if nothing needs to be imputed
    results <- imputation_list_single(
        D = n_imputations,
        imputations = replicate(
            n = n_imputations * length(index),
            simplify = FALSE,
            expr = imputation_single(id = id, values = matrix(numeric(0)))
        )
    )

    id_data <- data$extract_by_id(id)

    if (sum(id_data$is_missing) == 0) return(results)

    vars <- data$vars
    group_pt <- as.character(id_data$group)
    group_ref <- as.character(references[[group_pt]])


    dat_pt <- id_data$data
    dat_pt[, vars$outcome] <- 1  # Dummy outcome value to stop rows being dropped by model.matrix
    dat_ref <- dat_pt
    dat_ref[, vars$group] <- factor(group_ref, levels = levels(id_data$group))

    dat_pt_mod <- as_model_df(dat_pt, data$formula)
    dat_ref_mod <- as_model_df(dat_ref, data$formula)

    parameters_group <- get_visit_distribution_parameters(
        dat = dat_pt_mod[-1],  # -1 as first col from as_model_df is the outcome variable
        beta = beta[index],
        sigma = sigma[[group_pt]][index]
    )

    parameters_reference <- get_visit_distribution_parameters(
        dat = dat_ref_mod[-1], # -1 as first col from as_model_df is the outcome variable
        beta = beta[index],
        sigma = sigma[[group_ref]][index]
    )

    pars <- mapply(
        strategies[[id_data$strategy]],
        parameters_group,
        parameters_reference,
        MoreArgs = list(index_mar = id_data$is_mar),
        SIMPLIFY = FALSE
    )

    conditional_parameters <- lapply(
        pars,
        get_conditional_parameters,
        values = id_data$outcome
    )

    imputed_outcome <- unlist(lapply(
        conditional_parameters,
        impute_outcome,
        n_imputations = n_imputations,
        condmean = condmean
    ), recursive = FALSE)

    results <- imputation_list_single(
        D = n_imputations,
        imputations = lapply(imputed_outcome, function(x) {
            imputation_single(id = id, values = x)
        })
    )

    validate(results)
    return(results)
}


#' Derive visit distribution parameters
#'
#' Takes patient level data and beta coefficients and expands them
#' to get a patient specific estimate for the visit distribution parameters
#' `mu` and `sigma`. Returns the values in a specific format
#' which is expected by downstream functions in the imputation process
#' (namely  `list(list(mu = ..., sigma = ...), list(mu = ..., sigma = ...))`).
#'
#' @param dat Patient level dataset, must be 1 row per visit. Column order must
#' be in the same order as beta. The number of columns must match the length of beta
#' @param beta List of model beta coefficients. There should be 1 element for each sample
#' e.g. if there were 3 samples and the models each had 4 beta coefficients then this argument
#' should be of the form  `list( c(1,2,3,4) , c(5,6,7,8), c(9,10,11,12))`.
#' All elements of beta must be the same length and must be the same length and order as `dat`.
#' @param sigma List of sigma. Must have the same number of entries as `beta`.
get_visit_distribution_parameters <- function(dat, beta, sigma) {
    assert_that(
        length(unique(vapply(beta, length, numeric(1)))) == 1,
        msg = "All elements of beta must be the same length"
    )

    beta_mat <- matrix(
        unlist(beta, use.names = FALSE),
        nrow = length(beta[[1]]),
        ncol = length(beta),
        byrow = FALSE
    )
    mu <- as.matrix(dat) %*% beta_mat
    parameters <- list()
    for (i in seq_along(beta)) {
        parameters[[i]] <- list(
            mu = mu[, i],
            sigma = sigma[[i]]
        )
    }
    return(parameters)
}


#' Sample outcome value
#'
#' Draws a random sample from a multivariate normal distribution.
#'
#' @param conditional_parameters a list with elements `mu` and `sigma` which
#' contain the mean vector and covariance matrix to sample from.
#'
#' @param n_imputations numeric representing the number of random samples from the multivariate
#' normal distribution to be performed. Default is `1`.
#'
#' @param condmean should conditional mean imputation be performed (as opposed to random
#' sampling)
#'
#' @importFrom stats rnorm
impute_outcome <- function(conditional_parameters, n_imputations = 1, condmean = FALSE) {

    if (condmean) {
        expr <- quote(
            as.vector(conditional_parameters$mu)
        )
    } else {
        expr <- quote(
            sample_mvnorm(
                conditional_parameters$mu,
                conditional_parameters$sigma
            )
        )
    }

    assert_that(
        all(!is.na(conditional_parameters$mu)),
        all(!is.na(conditional_parameters$sigma)),
        msg = "Sigma or Mu contain missing values"
    )

    results <- replicate(
        n = n_imputations,
        simplify = FALSE,
        expr = eval(expr)
    )
    return(results)
}



#' Derive conditional multivariate normal parameters
#'
#' Takes parameters for a multivariate normal distribution and observed values
#' to calculate the conditional distribution for the unobserved values.
#'
#' @param pars a `list` with elements `mu` and `sigma` defining the mean vector and
#' covariance matrix respectively.
#' @param values a vector of observed values to condition on, must be same length as `pars$mu`.
#' Missing values must be represented by an `NA`.
#'
#' @return A list with the conditional distribution parameters:
#'
#' - `mu` - The conditional mean vector.
#' - `sigma` - The conditional covariance matrix.
#'
get_conditional_parameters <- function(pars, values) {
    q <- is.na(values)

    if (sum(q) == length(values)) return(pars)
    if (sum(q) == 0) return(list(mu = numeric(0), sigma = numeric(0)))

    a <- values[!q]

    mu1 <- matrix(nrow = sum(q), pars$mu[q])
    mu2 <- matrix(nrow = sum(!q), pars$mu[!q])

    sig11 <- pars$sigma[q, q, drop = FALSE]
    sig12 <- pars$sigma[q, !q, drop = FALSE]
    sig21 <- pars$sigma[!q, q, drop = FALSE]
    sig22 <- pars$sigma[!q, !q, drop = FALSE]

    sig22_inv_12 <-  sig12 %*% solve(sig22)

    x <- list(
        mu = mu1 + sig22_inv_12 %*% (a - mu2),
        sigma = sig11 - sig22_inv_12 %*% sig21
    )
    return(x)
}


#' Validate user supplied references
#'
#' Checks to ensure that the user specified references are
#' expect values (i.e. those found within the source data).
#'
#' @param x named character vector.
#' @param control factor variable (should be the `group` variable from the source dataset).
#' @param ... Not used.
#'
#' @return
#' Will error if there is an issue otherwise will return `TRUE`.
#' @export
validate.references <- function(x, control, ...) {
    references <- x
    ref_names <- names(references)

    assert_that(
        is.character(references),
        !is.null(ref_names),
        all(!is.na(references)),
        msg = "`references` should be a non-missing named character vector"
    )

    assert_that(
        all(ref_names != ""),
        msg = "All values of `references` must be named"
    )

    assert_that(
        length(unique(ref_names)) == length(ref_names),
        msg = "`references` must have unique names"
    )

    assert_that(
        is.factor(control),
        msg = "`control` should be a factor vector"
    )

    unique_refs <- unique(c(references, ref_names))
    valid_refs <- unique(as.character(control))

    assert_that(
        all(unique_refs %in% valid_refs),
        msg = paste0(
            "`references` contains values that are not present in the",
            "`group` variable of your source dataset"
        )
    )
    return(invisible(TRUE))
}


#' Validate user specified strategies
#'
#' Compares the user provided strategies to those that are
#' required (the reference). Will throw an error if not all values
#' of reference have been defined.
#'
#' @param strategies named list of strategies.
#' @param reference list or character vector of strategies that need to be defined.
#'
#' @return
#' Will throw an error if there is an issue otherwise will return `TRUE`.
validate_strategies <- function(strategies, reference) {

    strat_names <- names(strategies)

    assert_that(
        is.list(strategies),
        !is.null(strat_names),
        all(vapply(strategies, is.function, logical(1))),
        msg = "`strategies` must be a named list of functions"
    )

    assert_that(
        length(strat_names) == length(unique(strat_names)),
        msg = "`strategies` must be uniquely named"
    )

    unique_references <- unique(unlist(reference, use.names = FALSE))

    for (ref in unique_references) {
        assert_that(
            ref %in% strat_names,
            msg = sprintf("Required strategy `%s` has not been defined", ref)
        )
    }
    return(invisible(TRUE))
}













#' Create an imputation object
#'
#' This function creates the object that is returned from [impute()]. Essentially
#' it is a glorified wrapper around [list()] ensuring that the required elements have been
#' set and that the class is added as expected.
#'
#' @param imputations A list of `imputations_list`'s as created by [imputation_df()]
#'
#' @param data A `longdata` object as created by [longDataConstructor()]
#'
#' @param method A `method` object as created by [method_condmean()], [method_bayes()] or
#'  [method_approxbayes()]
#'
#' @param references A named vector. Identifies the references to be used when generating the
#' imputed values. Should be of the form `c("Group" = "Reference", "Group" = "Reference")`.
#'
as_imputation <- function(imputations, data, method, references) {
    x <- list(
        imputations = imputations,
        data = data,
        method = method,
        references = references
    )
    class(x) <- c("imputation", "list")
    return(x)
}


#' @export
validate.imputation <- function(x, ...) {
    assert_that(
        has_class(x$imputations, "imputation_list_df"),
        validate(x$imputations),
        has_class(x$data, "longdata"),
        has_class(x$method, "method"),
        has_class(x$references, "references"),
        validate(x$references, x$data$data[[x$data$vars$group]])
    )
}



#' Print `imputation` object
#'
#' @param x An `imputation` object generated by [impute()].
#' @param ... Not used.
#' @export
print.imputation <- function(x, ...) {

    ### Reference strings i.e.  A -> B
    ref_from <- names(x$references)
    ref_to <- x$references
    width <- max(nchar(ref_from))
    sstring <- paste0("%-", width, "s -> %s")
    ref_strings <- sprintf(sstring, ref_from, ref_to)

    ### % of missing data strings
    is_miss <- matrix(unlist(x$data$is_missing), ncol = length(x$data$visits), byrow = TRUE)
    is_miss_perc <- round((apply(is_miss, 2, sum) / nrow(is_miss)) * 100)
    width <- max(nchar(x$data$visits))
    sstring <- paste0("%-", width, "s: %3s%%")
    miss_strings <- sprintf(sstring, x$data$visits, is_miss_perc)


    n_imp <- length(x$imputations)
    n_imp_string <- ife(
        has_class(x$method, "condmean"),
        sprintf("1 + %s", n_imp - 1),
        as.character(n_imp)
    )

    string <- c(
        "",
        "Imputation Object",
        "-----------------",
        sprintf("Number of Imputed Datasets: %s", n_imp_string),
        "Fraction of Missing Data (Original Dataset):",
        sprintf("    %s", miss_strings),
        "References:",
        sprintf("    %s", ref_strings),
        ""
    )

    cat(string, sep = "\n")
    return(invisible(x))
}

Try the rbmi package in your browser

Any scripts or data that you put into this service are public.

rbmi documentation built on Nov. 24, 2023, 5:11 p.m.