R/SII_function.R
In PHEindicatormethods: Common Public Health Statistics and their Confidence Intervals

Documented in phe_sii

#' Calculate Slope Index of Inequality using phe_sii
#'
#' @description
#' `phe_sii()`returns the slope index of inequality (SII) statistic for each
#' subgroup of the inputted dataframe, with lower and upper confidence limits
#' based on the specified confidence.
#'
#' @details
#'
#' The Relative Index of Inequality (RII) can also be returned via an optional
#' argument.
#'
#' The SII and RII are two measures of health inequality. They show the relation
#' between the level of health or frequency of a health problem in different
#' population groups and the ranking of these groups on the social scale.
#'
#' The input dataframe should be grouped before passing to the function if an
#' SII/RII for each subgroup is required, and quantiles ordered from least to
#' most advantaged.
#'
#' @section Calculation:
#'
#' The SII is calculated using linear regression (1). To allow for differences
#' in population size between quantiles (e.g. deprivation deciles), each is
#' given a rank score (or relative rank) based on the midpoint of its range in
#' the cumulative distribution of the total area population. The quantiles are
#' first ordered (e.g from 1 most deprived to 10 least deprived for deprivation
#' deciles). If quantile 1 then contains 12 percent of the total population, its
#' relative rank is \code{0.12/2=0.6}. If quantile 2 includes 10 percent of the
#' population, its relative rank is \code{0.12+(0.10/2)=0.17}. A square root
#' transformation is applied to the regression to account for heteroskedasticity
#' (the tendancy for the variances of the quantile values to be related to the
#' size of the values, ie larger values will tend to have larger variances). A
#' regression model is fitted to the transformed data:  \eqn{Y * \sqrt a = \sqrt
#' a + b * \sqrt a}, where Y is the value of the indicator for the quantile, a
#' is the proportion of the total population in the quantile and b is the
#' relative rank. The SII is the gradient of the resulting fitted line, and
#' could be positive or negative according to the indicator polarity. Since the
#' relative ranks, by definition, range from 0 to 1, the SII is the difference
#' between the fitted value at  \code{x=1} and  \code{x=0}. The RII is the ratio
#' of the fitted value at  \code{x=1,Y1} and the fitted value at \code{x=0,Y0}.
#' which can be calculated as:  \code{RII = (Y0 + SII)/Y0}
#'
#' @section Transformations:
#'
#' The indicator type can be specified as 1 (rate), 2 (proportion) or 0 (other),
#' using the \code{value_type} parameter. This setting determines the data
#' transformations that will be applied in the following two parts of the
#' method.
#'
#' Use in conjunction with the \code{transform} parameter in calculation of the
#' SII: It is recommended that rates and proportions are transformed prior to
#' calculation of the SII by setting the \code{transform} parameter to TRUE for
#' these indicator types. This will perform a log transformation for rates, or
#' logit for proportions, and return outputs transformed back to the original
#' units of the indicator. These transformations are recommended to improve the
#' linearity between the indicator values and the quantile, which is an
#' assumption of the method. A user-provided standard error will not be accepted
#' when the \code{transform} parameter is set to TRUE as the confidence limits
#' are required for this transformation.
#'
#' Use in calculation of the standard error: Rates and proportions, and their
#' confidence limits, are transformed prior to calculation of the standard error
#' for each quantile. This is because it is assumed that the confidence interval
#' around the indicator value is non-symmetric for these indicator types. Note
#' that this transformation is not controlled by the \code{transform} parameter
#' and is applied based on the value of the \code{value_type} parameter only. A
#' user-provided standard error will not be accepted when the \code{transform}
#' parameter is set to TRUE as the confidence limits are required for this
#' transformation.
#'
#' @section Warning:
#'
#' The SII calculation assumes a linear relationship between indicator value and
#' quantile. Where this is not the case the transform option should be considered.
#' Small populations within quantiles can make the SII unstable. This
#' function does not include checks for linearity or stability; it is the user's
#' responsibility to ensure the input data is suitable for the SII calculation.
#'
#' @param data data.frame containing the required input fields, pre-grouped if
#'   an SII is required for each subgroup; unquoted string; no default
#' @param quantile field name within data that contains the quantile label (e.g.
#'   decile). The number of quantiles should be between 5 and 100; unquoted
#'   string; no default
#' @param population field name within data that contains the quantile
#'   populations (ie, denominator). Non-zero populations are required for all
#'   quantiles to calculate SII for an area; unquoted string; no default
#' @param x (for indicators that are proportions) field name within data that
#'   contains the members of the population with the attribute of interest (ie,
#'   numerator). This will be divided by population to calculate a proportion as
#'   the indicator value (if value field is not provided); unquoted string; no
#'   default
#' @param value field name within data that contains the indicator value (this
#'   does not need to be supplied for proportions if count and population are
#'   given); unquoted string; no default
#' @param value_type indicates the indicator type (1 = rate, 2 = proportion, 0 =
#'   other). The \code{value_type} argument is used to determine whether data should
#'   be transformed prior to calculation of the standard error and/or SII. See
#'   the \code{Tansformations} section for full details; integer; default 0
#' @param transform option to transform input rates or proportions prior to
#'   calculation of the SII. See the \code{Transformations} section for full
#'   details; logical; default FALSE
#' @param lower_cl field name within data that contains 95 percent lower
#'   confidence limit of indicator value (to calculate standard error of
#'   indicator value). This field is needed if the se field is not supplied;
#'   unquoted string; no default
#' @param upper_cl field name within data that contains 95 percent upper
#'   confidence limit of indicator value (to calculate standard error of
#'   indicator value). This field is needed if the se field is not supplied;
#'   unquoted string; no default
#' @param se field name within data that contains the standard error of the
#'   indicator value. If not supplied, this will be calculated from the 95
#'   percent lower and upper confidence limits (i.e. one or the other of these
#'   fields must be supplied); unquoted string; no default
#' @param multiplier factor to multiply the SII and SII confidence limits by
#'   (e.g. set to 100 to return prevalences on a percentage scale between 0 and
#'   100). If the multiplier is negative, the inverse of the RII is taken to
#'   account for the change in polarity; numeric; default 1;
#' @param repetitions number of random samples to perform to return confidence
#'   interval of SII (and RII). Minimum is 1000, no maximum (though the more
#'   repetitions, the longer the run time); numeric; default 100,000
#' @param confidence confidence level used to calculate the lower and upper
#'   confidence limits of SII, expressed as a number between 0.9 and 1, or 90
#'   and 100. It can be a vector of 0.95 and 0.998, for example, to output both
#'   95 percent and 99.8 percent CIs; numeric; default 0.95
#' @param rii option to return the Relative Index of Inequality (RII) with
#'   associated confidence limits as well as the SII; logical; default FALSE
#' @param intercept option to return the intercept value of the regression line
#'   (y value where x=0); logical; default FALSE
#' @param reliability_stat option to carry out the SII confidence interval
#'   simulation 10 times instead of once and return the Mean Average Difference
#'   between the first and subsequent samples (as a measure of the amount of
#'   variation). Warning: this will significantly increase run time of the
#'   function and should first be tested on a small number of repetitions;
#'   logical; default FALSE
#' @param type "full" output includes columns in the output dataset specifying
#'   the parameters the user has input to the function (value_type, multiplier,
#'   CI_confidence, CI_method); character string either "full" or "standard";
#'   default "full"
#'
#' @references
#' (1) Low A & Low A. Measuring the gap: quantifying and comparing local health inequalities.
#' Journal of Public Health; 2004;26:388-395. \cr \cr
#'
#' @import dplyr
#' @import broom
#' @importFrom rlang quo_text
#' @importFrom purrr map
#' @importFrom tidyr nest unnest spread
#' @importFrom stats rnorm qnorm lm
#' @importFrom tidyselect where
#' @importFrom rlang := .data .env
#'
#' @examples
#' library(dplyr)
#'
#' data <- data.frame(area = c(rep("Area1", 10), rep("Area2", 10)),
#'                    decile = c(1:10, 1:10),
#'                    population = c(7291, 7997, 6105, 7666, 5790, 6934, 5918, 5974, 7147, 7534, 21675,
#'                                   20065, 19750, 24713, 20112, 19618, 22408, 19752, 18939, 19312),
#'                    value = c(75.9, 78.3, 83.8, 83.6, 80.5, 81.1, 81.7, 84.2, 80.6, 86.3, 70.5,
#'                               71.6, 72.5, 73.5, 73.1, 76.2, 78.7, 80.6, 80.9, 80),
#'                    lowerCL = c(72.7,75.3,80.9,80.2,77.1,78,79,81.4,75.8,83.2,
#'                                70.1,71.1,72,73.1, 72.7, 75.7, 78.2,80.1,80.4,79.5),
#'                    upperCL = c(79.1,81.4,86.8,87.1,83.8,84.2,84.4,86.9,85.4,
#'                                 89.4,71,72.1,73.2,73.7,75.8,78.8,79.8,81.2,81.3,80.9),
#'                    StandardError = c(1.64,1.58,1.51,1.78,1.7,1.56,1.37,1.4,2.43,
#'                                      1.57,0.23,0.26,0.3,0.16,0.79,0.78,0.4,0.28,0.23,0.35)
#'                    )
#'
#'
#' # Run SII function on the two areas in the data
#' phe_sii(group_by(data, area),
#'         decile,
#'         population,
#'         value_type = 0, # default normal distribution
#'         value = value,
#'         lower_cl = lowerCL,
#'         upper_cl = upperCL,
#'         confidence = 0.95,
#'         rii = TRUE,
#'         type = "standard")
#'
#' # Supplying the standard error instead of the indicator 95 percent confidence limits
#' # gives the same result
#' phe_sii(group_by(data, area),
#'         decile,
#'         population,
#'         value_type = 0,
#'         value = value,
#'         se = StandardError,
#'         confidence = 0.95,
#'         rii = TRUE,
#'         type = "standard")
#'
#' # multiple confidence intervals, log transforming the data if they are rates
#' phe_sii(group_by(data, area),
#'         decile,
#'         population,
#'         value_type = 1,
#'         transform = TRUE,
#'         value = value,
#'         lower_cl = lowerCL,
#'         upper_cl = upperCL,
#'         confidence = c(0.95, 0.998),
#'         repetitions = 10000,
#'         rii = TRUE,
#'         type = "standard")
#'
#' @export
#'
#' @return The SII with lower and upper confidence limits for each subgroup of
#'   the inputted data.frame.
#'
#' @family PHEindicatormethods package functions
# -------------------------------------------------------------------------------------------------

phe_sii <- function(data, quantile, population,  # compulsory fields
                    x = NULL,                    # optional fields
                    value = NULL,
                    value_type = 0,
                    transform = FALSE,
                    lower_cl = NULL,
                    upper_cl = NULL,
                    se = NULL,
                    multiplier = 1,
                    repetitions = 100000,
                    confidence = 0.95,
                    rii = FALSE,
                    intercept = FALSE,
                    reliability_stat = FALSE,
                    type = "full") {

        # Part 1 - Checks on input data ---------------------------------------------

        if (missing(data)| missing(quantile)| missing(population)) {
          stop("function phe_sii requires the arguments: data, quantile, population")
        }
        if (missing(value) & missing(x)) {
          stop("function phe_sii requires value field, or x field if indicator is a proportion of population")
        }
        if (missing(se) & (missing(upper_cl) | missing(lower_cl))) {
          stop("function phe_sii requires either lower_cl and upper_cl fields, or se field")
        }
        if (!(value_type %in% c(0,1,2))) {
          stop("value_type should be 0, 1 or 2")
        }
        if (!(class(multiplier) %in% c("numeric", "integer") & class(repetitions) %in% c("numeric", "integer") & class(confidence) %in% c("numeric", "integer"))) {
          stop("multiplier, repetitions and confidence inputs should be numeric")
        }
        if (repetitions < 1000) {
          stop("number of repetitions must be 1000 or greater. Default is 100,000")
        }
        # if transform is true then value type must be rate or proportion
        if (transform == TRUE & value_type == 0) {
          stop("value_type should be 1 or 2 when transform is true")
        }
        # if transform is true then se cannot be provided
        if (transform == TRUE & !(missing(se))) {
          stop("function phe_sii requires se to be missing when transform is true")
        }
        # if transform is true then upper and lower cls must be provided
        if (transform == TRUE & (missing(upper_cl) | missing(lower_cl))) {
          stop("function phe_sii requires lower_cl and upper_cl fields when transform is true")
        }
        # check on confidence limit requirements
        if (any(confidence < 0.9) | (any(confidence > 1) & any(confidence < 90)) | any(confidence > 100)) {
            stop("all confidence levels must be between 90 and 100 or between 0.9 and 1")
        }

        # Use NSE on inputs - apply quotes
        quantile = enquo(quantile)
        population = enquo(population)
        if(!missing(x)) {x = enquo(x)}
        if(!missing(value)) {value = enquo(value)}
        if(!missing(se)) {se = enquo(se)}
        if(!missing(lower_cl)) {lower_cl = enquo(lower_cl)}
        if(!missing(upper_cl)) {upper_cl = enquo(upper_cl)}

        # scale confidence level
        confidence[confidence >= 90] <- confidence[confidence >= 90] / 100

        # check for non numeric inputs
        if(!(class(pull(data, {{ population }})) %in% c("numeric", "integer")
            & ifelse(rlang::quo_text(x) %in% names(data), (class(pull(data, {{ x }})) %in% c("numeric", "integer")), TRUE)
            & ifelse(rlang::quo_text(value) %in% names(data), (class(pull(data, {{ value }})) %in% c("numeric", "integer")), TRUE)
            & ifelse(rlang::quo_text(se) %in% names(data), (class(pull(data, {{ se }})) %in% c("numeric", "integer")), TRUE)
            & ifelse(rlang::quo_text(lower_cl) %in% names(data), (class(pull(data, {{ lower_cl }})) %in% c("numeric", "integer")), TRUE)
            & ifelse(rlang::quo_text(upper_cl) %in% names(data), (class(pull(data, {{ upper_cl }})) %in% c("numeric", "integer")), TRUE))) {
                stop("some input fields in data.frame are non-numeric")
        }

        # check for zero or negative populations
        negative_pops <- data %>%
                filter({{ population }} <= 0 | is.na({{ population }}))

                if (nrow(negative_pops) > 0) {
                        stop("some groups have a zero, negative or missing population")
        }

        # check for negative/missing standard errors
        if(rlang::quo_text(se) %in% names(data)) {
        negative_se <- data %>%
                filter({{ se }} < 0 | is.na({{ se }}))

                if (nrow(negative_se) > 0) {
                        stop("negative or missing standard errors in input dataset")
                        }
        }

        # check for missing confidence limits
        if(rlang::quo_text(lower_cl) %in% names(data) & rlang::quo_text(upper_cl) %in% names(data)) {
          negative_cl <- data %>%
            filter(is.na({{ lower_cl }}) | is.na({{ upper_cl }}))

          if (nrow(negative_cl) > 0) {
            stop("missing lower or upper confidence limits in input dataset")
          }
        }

        # checks on PROPORTIONS
        if(value_type == 2) {

            # check for proportions outside (0,1) range
            if(rlang::quo_text(value) %in% names(data)) {
              invalid_prop <- data %>%
                filter({{ value }} < 0 | {{ value }} > 1)

              if (nrow(invalid_prop) > 0) {
                stop("value proportions are not all between 0 and 1")
              }
            }

            # check for lower and upper CLs outside (0,1) range
            if(rlang::quo_text(lower_cl) %in% names(data) &
               rlang::quo_text(upper_cl) %in% names(data)) {
              invalid_prop_cl <- data %>%
                filter({{ lower_cl }} < 0 | {{ lower_cl }} > 1 |
                         {{ upper_cl }} < 0 | {{ upper_cl }} > 1)

              if (nrow(invalid_prop_cl) > 0) {
                stop("confidence limit proportions are not all between 0 and 1")
              }
            }

            # check for zero or negative counts
            if(!(rlang::quo_text(value) %in% names(data)) &
               rlang::quo_text(x) %in% names(data)) {
              negative_x <- data %>%
                filter({{ x }} <= 0)

              if (nrow(negative_x) > 0) {
                stop("some groups have a zero or negative count x")
              }
          }
        }

        # Part 2 - Start calculations ---------------------------------------------

        # extract grouping variables of input dataset (if any)
        grouping_variables <- group_vars(data)

        # Convert factors to character
        data <- data %>%
                 ungroup() %>%
                 mutate(across(where(is.factor), as.character)) %>%
                 group_by(!!! syms(c(grouping_variables)))

        # Extract vector of quantiles and save the number to "no_quantiles"
        quantile_list <- unique(select(ungroup(data), {{ quantile }}))
        no_quantiles <- nrow(quantile_list)

        # Output warning on number of quantiles inputted
        if (no_quantiles < 5 | no_quantiles > 100) {
                stop("Number of quantiles must be between 5 and 100")
        } else if (no_quantiles > 10) {
                warning("WARNING: Small values can make SII unstable when using a large number of quantiles")
        }

        # Remove records with missing essential data
        if (rlang::quo_text(se) %in% names(data)) {

        valid_complete <- data %>%
                             filter({{ population }} > 0,
                                    !is.na({{ se }}))
        } else if (rlang::quo_text(lower_cl) %in% names(data) &
                   rlang::quo_text(upper_cl) %in% names(data)) {

        valid_complete <- data %>%
                                filter({{ population }} > 0,
                                       !is.na({{ lower_cl }}), !is.na({{ upper_cl }}))
        }

        # Not all quantiles may have data for each grouping
        # Start by counting the number of quantiles each area has data for -
        # exclude any areas with missing data (SII cannot be calculated)
        valid_areas <- valid_complete %>%
                         summarise(n = length(unique({{ quantile }}))) %>%
                         filter(n == no_quantiles)

        # Create table of areas to calculate SII for
        valid_deciles <- valid_areas %>%
                         merge(quantile_list, # Merge on list of quantiles
                              all.x = TRUE,
                              all.y = TRUE)

        if (nrow(valid_deciles) != nrow(data)) {
                warning("WARNING: some records have been removed due to incomplete or invalid data")
        }


        # join provided data to valid decile table
        pops_prep <- left_join(valid_deciles, data,
                            by = c(grouping_variables, rlang::quo_text(quantile))) %>%
                     group_by(!!! syms(c(grouping_variables, rlang::quo_text(quantile)))) %>%
                     arrange(!!! syms(c(grouping_variables, rlang::quo_text(quantile))))

        # Calculate indicator value (if not supplied in input data) as proportion for each
        # quantile (x/population)

        if (rlang::quo_text(value) %in% names(pops_prep)) {
                pops_prep <- mutate(pops_prep, value = {{ value }})
        } else if (value_type == 2) {
                pops_prep <- mutate(pops_prep, value = {{ x }} / {{ population }})
        }

        # Transform value if value is a rate or proportion
         pops_prep <- pops_prep %>%
                mutate(value = ifelse(value_type == 1,
                                      log(.data$value),
                               ifelse(value_type == 2,
                                      log(.data$value / (1 - .data$value)),
                                      .data$value)))

        # Transform lower and upper confidence limits in the case of a rate or proportion
         if (rlang::quo_text(lower_cl) %in% names(pops_prep) &
             rlang::quo_text(upper_cl) %in% names(pops_prep)) {

         pops_prep <- pops_prep %>%
                mutate(lower_cl = ifelse(value_type == 0, {{ lower_cl }},
                                      ifelse(value_type == 1, log({{ lower_cl }}),
                                         ifelse(value_type == 2, log({{ lower_cl }} / (1 - {{ lower_cl }})),
                                                NA))),
                       upper_cl = ifelse(value_type == 0, {{ upper_cl }},
                                        ifelse(value_type == 1, log({{ upper_cl }}),
                                                ifelse(value_type == 2, log({{ upper_cl }} / (1 - {{ upper_cl }})),
                                         NA))))
         }

        # Calculate standard error (if not supplied in input data), from lower and upper CLs
        z <- stats::qnorm(0.975) # hard-coded at 95% confidence

        if (rlang::quo_text(se) %in% names(pops_prep)) {
                pops_prep <- mutate(pops_prep,
                                    se_calc = {{ se }})
        } else {
                pops_prep <- mutate(pops_prep,
                                    se_calc = (upper_cl - lower_cl) / z / 2)
        }

        # Calculate a and b vals
        pops_prep_ab <- pops_prep %>%
                group_by(!!! syms(grouping_variables)) %>%
                mutate(a_vals = {{ population }}/ sum({{ population }}), # Proportion of total population of subgroup
                       b_vals = FindXValues({{ population }}, no_quantiles))

        # Calculate sqrt(a), bsqrt(a) and un-transformed y value for regression
        if(transform == FALSE) {
          pops_prep_ab <- pops_prep_ab %>%
                group_by(!!! syms(c(grouping_variables, rlang::quo_text(quantile)))) %>%
                mutate(sqrt_a = sqrt(.data$a_vals),
                       b_sqrt_a = .data$b_vals * .data$sqrt_a,
                       value_transform = ifelse(value_type == 1, exp(.data$value),
                                                ifelse(value_type == 2,
                                                       exp(.data$value) / (1 + exp(.data$value)),
                                                       .data$value)),
                       yvals = .data$sqrt_a * .data$value_transform)
        } else {
          pops_prep_ab <- pops_prep_ab %>%
          group_by(!!! syms(c(grouping_variables, rlang::quo_text(quantile)))) %>%
          mutate(sqrt_a = sqrt(.data$a_vals),
                  b_sqrt_a = .data$b_vals * .data$sqrt_a,
                  value_transform = .data$value,
                  yvals = .data$sqrt_a * .data$value_transform)
        }

        # calculate confidence interval for SII via simulation
        # Repeat this 10 times to get a "variability" measure if requested

        # Nest data (different argument needed for grouped vs. ungrouped dataset)
        if(length(grouping_variables) == 0) {
            popsSII_model <- pops_prep_ab %>%
                tidyr::nest(data = everything())
        } else {
            popsSII_model <- pops_prep_ab %>%
                group_by(!!! syms(grouping_variables)) %>%
                tidyr::nest()
        }

        # Different nest() argument needed for ungrouped dataset
        if(length(grouping_variables) == 0) {
            sim_CI <- pops_prep_ab %>%
                tidyr::nest(data = everything()) %>%
                mutate(CI_params = purrr::map(data, ~ SimulationFunc(data = .,
                                                                     .data$value,
                                                                     value_type,
                                                                     .data$se_calc,
                                                                     repetitions,
                                                                     confidence,
                                                                     multiplier,
                                                                     .data$sqrt_a,
                                                                     .data$b_sqrt_a,
                                                                     rii,
                                                                     transform,
                                                                     reliability_stat)))
        } else {
            sim_CI <- pops_prep_ab %>%
                group_by(!!! syms(grouping_variables)) %>%
              tidyr::nest() %>%
              mutate(CI_params = purrr::map(data, ~ SimulationFunc(data = .,
                                                                   .data$value,
                                                                   value_type,
                                                                   .data$se_calc,
                                                                   repetitions,
                                                                   confidence,
                                                                   multiplier,
                                                                   .data$sqrt_a,
                                                                   .data$b_sqrt_a,
                                                                   rii,
                                                                   transform,
                                                                   reliability_stat)))
        }



        # Perform regression to calculate SII and extract model parameters

        popsSII_model <- popsSII_model %>%
          # perform linear model
          mutate(model = purrr::map(data, function(df)
            stats::lm(yvals ~ sqrt_a + b_sqrt_a - 1, data = df))) %>%
          # extract model coefficients
          mutate(model = purrr::map(.data$model, broom::tidy)) %>%
          tidyr::unnest("model") %>%
          # remove unecessary fields
          select(!c("std.error", "statistic", "p.value")) %>%
          # create columns for each parameter
          tidyr::pivot_wider(names_from = "term",
                             values_from = "estimate")


        # Format results according to whether transform = T/F

        if(transform == FALSE) { #no anti-transform needed
          # Extract SII and RII values
          popsSII_model <- popsSII_model %>%
            mutate(sii = multiplier * .data$b_sqrt_a,
                   rii = (.data$sqrt_a + .data$b_sqrt_a)/.data$sqrt_a,
                   intercept = .data$sqrt_a) %>%
            # Take inverse of RII if multiplier is negative
            mutate(rii = ifelse(multiplier < 0, 1 / rii, rii)) %>%
            # Select fields to keep
            select(all_of(grouping_variables), "sii", "rii", "intercept")

          # join on dataset with SII/ RII confidence limits
          # Get CIs from first round of reps
          # Unnest confidence limits in a data frame for joining

          sim_CI_rep1 <- sim_CI %>%
            select(!c("data")) %>%
            tidyr::unnest("CI_params") |>
            slice_head(n = 1)

          if (length(grouping_variables) > 0) {
            # (grouped dataset)
            popsSII_model <- popsSII_model %>%
              left_join(sim_CI_rep1, by = grouping_variables)
          } else {
            # ungrouped dataset
            popsSII_model <- popsSII_model %>%
              cbind(sim_CI_rep1)
          }

          # Add reliability stats

          if (isTRUE(reliability_stat)) {

            sim_CI <- rename(sim_CI, "CI_calcs" = "CI_params")

            reliabity_stats <- calc_reliability(
              CI_data = sim_CI,
              confidence = confidence,
              rii = rii
            )

            if (length(grouping_variables) > 0) {
              # (grouped dataset)
              popsSII_model <- popsSII_model %>%
                left_join(reliabity_stats, by = grouping_variables)
            } else {
              # ungrouped dataset
              popsSII_model <- popsSII_model %>%
                cbind(reliabity_stats)
            }

          }

        } else {

          popsSII_model <- popsSII_model %>%
            mutate(sii = .data$b_sqrt_a,
                   intercept = .data$sqrt_a) %>%
            # Select fields to keep
            select(all_of(grouping_variables), "sii", "intercept")

          #Do calculations that can be done outside of loop as they don't need the CI fields
          if (value_type == 1) {#anti-log needed

            popsSII_model <- popsSII_model %>%
              mutate(xequals1 = .data$intercept + .data$sii,
                     xequalshalf = (.data$intercept + .data$xequals1) / 2,
                     antilogintercept = exp(.data$intercept),
                     antilogxequals1 = exp(.data$xequals1),
                     multiplier = .env$multiplier,
                     sii = (.data$antilogxequals1 - .data$antilogintercept) * .data$multiplier,
                     rii = if_else(
                       .data$multiplier < 0,
                       1 / (.data$antilogxequals1 / .data$antilogintercept),
                       .data$antilogxequals1 / .data$antilogintercept
                     ),
                     intercept = .data$antilogintercept * abs(.data$multiplier))

          } else if (value_type == 2) {#anti-logit needed

            popsSII_model <- popsSII_model %>%
              mutate(xequals1 = .data$intercept + .data$sii,
                     xequalshalf = (.data$intercept + .data$xequals1) / 2,
                     antilogintercept = exp(.data$intercept) / (1 + exp(.data$intercept)),
                     antilogxequals1 = exp(.data$xequals1) / (1 + exp(.data$xequals1)),
                     multiplier = .env$multiplier,
                     sii = (.data$antilogxequals1 - .data$antilogintercept) * .data$multiplier,
                     rii = if_else(
                       multiplier < 0,
                       1 / (.data$antilogxequals1 / .data$antilogintercept),
                       .data$antilogxequals1 / .data$antilogintercept
                     ),
                     intercept = .data$antilogintercept * abs(.data$multiplier))
          }


          popsSII_model_CIs <- popsSII_model |>
            select(all_of(grouping_variables), "xequalshalf")

          popsSII_model <- popsSII_model |>
            select(all_of(grouping_variables), "sii", "rii", "intercept")

          # join on dataset with confidence limits
          if (length(grouping_variables) > 0) {
            # (grouped dataset)
            popsSII_model_CIs <- popsSII_model_CIs %>%
              left_join(sim_CI, by = grouping_variables)
          } else {
            # ungrouped dataset
            popsSII_model_CIs <- popsSII_model_CIs %>%
              cbind(sim_CI)
          }

          # Calculate SII and RII for each rep

          popsSII_model_CIs <- popsSII_model_CIs |>
            mutate(
              CI_calcs = purrr::map2(.data$CI_params, .data$xequalshalf, function(data, xequalshalf) {

                map(confidence, function(conf) {

                  conf_formatted <-
                    gsub("\\.", "_", formatC(conf * 100, format = "f", digits = 1))

                  selected_data <- data %>%
                    select(contains(conf_formatted)) |>
                    select(contains("sii")) |>
                    rename(
                      "sii_lower" = contains("sii_lower"),
                      "sii_upper" = contains("sii_upper")
                    )

                  if (value_type == 1) {

                    SII_calculations <- selected_data %>%
                      mutate(interceptlcl = xequalshalf - (.data$sii_lower / 2),
                             interceptucl = xequalshalf - (.data$sii_upper / 2),
                             xequals1lcl = xequalshalf + (.data$sii_lower / 2),
                             xequals1ucl = xequalshalf + (.data$sii_upper / 2),
                             multiplier = .env$multiplier,
                             sii_lower = if_else(.data$multiplier < 1, (exp(.data$xequals1ucl) - exp(.data$interceptucl)) * .data$multiplier,
                                                 (exp(.data$xequals1lcl) - exp(.data$interceptlcl)) * .data$multiplier),
                             sii_upper = if_else(.data$multiplier < 1, (exp(.data$xequals1lcl) - exp(.data$interceptlcl)) * .data$multiplier,
                                                 (exp(.data$xequals1ucl) - exp(.data$interceptucl)) * .data$multiplier)
                      )

                    if (isTRUE(rii)) {
                      SII_calculations <- SII_calculations |>
                        mutate(
                          rii_lower = if_else(
                            .data$multiplier < 1,
                            1 / (exp(.data$xequals1ucl) / exp(.data$interceptucl)
                            ),
                            exp(.data$xequals1lcl) / exp(.data$interceptlcl)),
                          rii_upper = if_else(
                            .data$multiplier < 1,
                            1 / (exp(.data$xequals1lcl) / exp(.data$interceptlcl)),
                            exp(.data$xequals1ucl) / exp(.data$interceptucl)
                          )
                        )
                    }

                  } else if (value_type == 2) {

                    SII_calculations <- selected_data %>%
                      mutate(interceptlcl = xequalshalf - (.data$sii_lower / 2),
                             interceptucl = xequalshalf - (.data$sii_upper / 2),
                             xequals1lcl = xequalshalf + (.data$sii_lower / 2),
                             xequals1ucl = xequalshalf + (.data$sii_upper / 2),
                             multiplier = .env$multiplier,
                             sii_lower = if_else(
                               .data$multiplier < 0,
                               ((exp(.data$xequals1ucl) / (1 + exp(.data$xequals1ucl))) - (exp(.data$interceptucl) / (1 + exp(.data$interceptucl)))) * .data$multiplier,
                               ((exp(.data$xequals1lcl) / (1 + exp(.data$xequals1lcl))) - (exp(.data$interceptlcl) / (1 + exp(.data$interceptlcl)))) * .data$multiplier
                             ),
                             sii_upper = if_else(
                               .data$multiplier < 0,
                               ((exp(.data$xequals1lcl) / (1 + exp(.data$xequals1lcl))) - (exp(.data$interceptlcl) / (1 + exp(.data$interceptlcl)))) * .data$multiplier,
                               ((exp(.data$xequals1ucl) / (1 + exp(.data$xequals1ucl))) - (exp(.data$interceptucl)/(1 + exp(.data$interceptucl)))) * .data$multiplier
                             )
                      )

                    if (isTRUE(rii)) {

                      SII_calculations <- SII_calculations %>%
                        mutate(
                          rii_lower = if_else(
                            multiplier < 0,
                            1 / ((exp(.data$xequals1ucl) / (1 + exp(.data$xequals1ucl))) / (exp(.data$interceptucl) / (1 + exp(.data$interceptucl)))),
                            ((exp(.data$xequals1lcl)/(1 + exp(.data$xequals1lcl))) / (exp(.data$interceptlcl)/(1 + exp(.data$interceptlcl))))),
                          rii_upper = if_else(
                            .data$multiplier < 0,
                            1 / ((exp(.data$xequals1lcl) / (1 + exp(.data$xequals1lcl))) / (exp(.data$interceptlcl)/(1 + exp(.data$interceptlcl)))),
                            ((exp(.data$xequals1ucl) / (1 + exp(.data$xequals1ucl))) / (exp(.data$interceptucl)/(1 + exp(.data$interceptucl))))
                          )
                        )
                    }

                  }

                  SII_calculations <- SII_calculations |>
                    select(any_of(contains(c("sii_lower", "sii_upper",
                                             "rii_lower", "rii_upper")))) |>
                    rename_with(.fn = \(x) paste0(x, conf_formatted))

                }
                ) |>
                  bind_cols()
              }
              )
            ) |>
            select(all_of(grouping_variables), "CI_calcs")

          # Add CIs to model
          # join on dataset with SII/ RII confidence limits
          # Get CIs from first round of reps
          # Unnest confidence limits in a data frame for joining

          CI_rep1 <- popsSII_model_CIs %>%
            select(all_of(grouping_variables), "CI_calcs") %>%
            tidyr::unnest("CI_calcs") |>
            slice_head(n = 1)

          if (length(grouping_variables) > 0) {
            # (grouped dataset)
            popsSII_model <- popsSII_model %>%
              left_join(CI_rep1, by = grouping_variables)
          } else {
            # ungrouped dataset
            popsSII_model <- popsSII_model %>%
              cbind(CI_rep1)
          }

          # Add reliability stats
          if (isTRUE(reliability_stat)) {

            reliabity_stats <- calc_reliability(
              CI_data = popsSII_model_CIs,
              confidence = confidence,
              rii = rii
            )

            if (length(grouping_variables) > 0) {
              # (grouped dataset)
              popsSII_model <- popsSII_model %>%
                left_join(reliabity_stats, by = grouping_variables)
            } else {
              # ungrouped dataset
              popsSII_model <- popsSII_model %>%
                cbind(reliabity_stats)
            }

          }

        }

        # Part 3 - Choose and format output fields --------------------------------

        # Remove reliability stat columns (if not requested by user)
        if (reliability_stat == FALSE) {
          popsSII_model <- popsSII_model %>%
            select(-contains("mad"))
        }

        # Remove RII columns (if not requested by user)
        if(rii == FALSE) {
          popsSII_model <- popsSII_model %>%
            select(!contains("rii"))
        }

        # Remove intercept columns (if not requested by user)
        if(intercept == FALSE) {
          popsSII_model <- popsSII_model %>%
            select(!"intercept")
        }

        # Move intercept to last column of dataframe
        if(intercept == TRUE) {
          popsSII_model <- popsSII_model %>%
            select(!"intercept", "intercept")
        }

        # Add metadata columns to output dataset (if requested by user)
        if (type == "full") {

          popsSII_model  <- popsSII_model %>%

            mutate(indicator_type = if_else(value_type == 0, "normal",
                                            if_else(value_type == 1,
                                                  "rate", "proportion")),
                   multiplier = multiplier,
                   transform = if_else(transform == TRUE & value_type == 1, "log",
                                       if_else(transform == TRUE & value_type == 2, "logit",
                                             "none")),
                   CI_confidence = paste0(confidence * 100, "%",
                                          collapse = ", "),
                   CI_method = paste0("simulation ", repetitions, " reps"))

        }

        # return output dataset
        return(popsSII_model)

}
Any scripts or data that you put into this service are public.
PHEindicatormethods documentation built on April 4, 2025, 1:36 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
PHEindicatormethods
Common Public Health Statistics and their Confidence Intervals

R/SII_function.R
In PHEindicatormethods: Common Public Health Statistics and their Confidence Intervals

Defines functions phe_sii

Documented in phe_sii

Try the PHEindicatormethods package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PHEindicatormethods Common Public Health Statistics and their Confidence Intervals

R/SII_function.R In PHEindicatormethods: Common Public Health Statistics and their Confidence Intervals

Defines functions phe_sii

Documented in phe_sii

Try the PHEindicatormethods package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PHEindicatormethods
Common Public Health Statistics and their Confidence Intervals

R/SII_function.R
In PHEindicatormethods: Common Public Health Statistics and their Confidence Intervals