R/bal.cs.psa.R

Defines functions bal.cs.psa

Documented in bal.cs.psa

#' Balance for Categorical Covariate: Random Strata as part of a PSA
#'
#' Function provides a measure of the balance achieved between control and
#' treatment groups for a categorical covariate from user defined strata. This
#' statistic is compared to the same measure for randomly permuted strata.
#'
#' This function measures the balance achieved across K strata for a
#' categorical covariate with J categories.  If \eqn{p_{ijk}} is the proportion
#' of cases in stratum k, category j, and treatment i, then the statistic is
#' the sum over all K, J of \eqn{ |\sqrt{p_{0jk} + \epsilon} - \sqrt{p_{1jk} +
#' \epsilon } | }.  A permutation distribution is generated by randomly
#' assigning cases to strata, thus generating B permuted stratifications and
#' the associated B permutation statistics.  The permutation stratifications
#' are generated under a fixed marginals model to retain comparability with the
#' original stratification.  A histogram of the permutation statistics is
#' produced with the original statistic referenced as a red dot.
#'
#' @param categorical Categorical covariate that is being balanced within
#' strata in a PSA. If \code{categorical} has three columns, then the second
#' and third are assumed to be the treatment and strata respectively.  Missing
#' values are not allowed. May be factor or numeric.
#' @param treatment Binary variable of same length as \code{categorical};
#' generally 0 for 'control,' 1 for 'treatment.'
#' @param strata Integer variable; a vector of same length as
#' \code{categorical} indicating the derived strata from estimated propensity
#' scores.
#' @param B Numeric; number of randomly generated iterations of the balance
#' measure are created for the comparison distribution.
#' @param eps Numeric; ensures that weighting is reasonable for small
#' categories.
#' @param main Title passed to \code{histogram}.
#' @param \dots Other graphical parameters passed to \code{histogram}.
#' @return In addition to the histogram, a list with the following components
#' is returned: \item{balance.orig }{Balance measure of user defined strata.}
#' \item{rank.orig}{Rank of original balance measure in comparison with the B
#' randomly generated values.}
#' @author James E. Helmreich \email{ James.Helmreich@@Marist.edu}
#'
#' Robert M. Pruzek \email{RMPruzek@@yahoo.com}
#' @seealso \code{bal.cws.psa}, \code{bal.ms.psa}, \code{bal.ks.psa}
#' @keywords htest
#' @examples
#' #Everything random
#' categorical<-sample(4,1000,replace=TRUE)
#' treatment<-sample(c(0,1),1000,replace=TRUE)
#' strata<-sample(5,1000,replace=TRUE)
#' bal.cs.psa(categorical,treatment,strata)
#'
#' #Perfect balance on 80%, random on last 20%
#' categorical<-rep(sample(5,1000,replace=TRUE),2)
#' treatment<-c(rep(0,1000),rep(1,1000))
#' strat<-sample(6,1200,replace=TRUE)
#' strat<-c(strat[1:1000],strat[1:800],strat[1001:1200])
#' bal.cs.psa(categorical,treatment,strat,B=200)
#'
#'
#' @export bal.cs.psa
bal.cs.psa <-
  function(categorical,
           treatment = NULL,
           strata = NULL,
           B = 1000,
           eps = .02,
           main = NULL,
           ...) {
    #This function provides an ad-hoc measure of the balance achieved between control and treatment
    #groups for a categorical variable from user defined strata. This is compared to the same measure
    #for randomly generated strata. The proportion of each category for a given strata and treatment
    #is calculated. The proportions are transformed slightly so as to give more weight to differences in
    #small categories.  Then the absolute difference between (transformed) treatment proportions calculated for each
    #category and stratum is found and summed, both within and across strata.  This constitutes
    #the basic statistic.  For comparison purposes, strata are randomly generated 'B' times and the same statistic
    #recorded for the random strata.  The percentile rank of the true measure is found in comparison with the randomly
    #generated distribution.  A histogram of the distribution and with the true value highlighted is generated.
    #Output is a list with the true measure, the percentile of the true measure, and the 'B' random replicates.

    #If "categorical" has three columns, treat as c, t, s.
    if (dim(as.data.frame(categorical))[2] == 3) {
      treatment   <- categorical[, 2]
      strata      <-
        categorical[, 3]
      categorical <-
        categorical[, 1]
    }

    n <- length(categorical)
    nstrat <- dim(table(strata))

    #Calculation of the statistic of the original strata
    sum.abs.diff.original <- NULL
    table.cts.original <- table(categorical, treatment, strata)
    sum.strata.01 <- NULL
    for (j in 1:nstrat) {
      sum.strata.01 <-
        rbind(sum.strata.01, apply(table.cts.original[, , j], 2, sum))
    }

    prop.bt <- NULL
    for (j in 1:nstrat) {
      # prop.i is a vector of the proportion of 0/1 by categorical level
      # level for stratum j
      prop.0 <- table.cts.original[, 1, j] / sum.strata.01[j, 1]
      prop.1 <- table.cts.original[, 2, j] / sum.strata.01[j, 2]
      prop.bt <-
        sum(prop.bt, sum(abs((prop.0 + eps) ^ .5 - (prop.1 + eps) ^ .5)))
    }
    sum.abs.diff.original <- prop.bt

    sum.abs.diff <- NULL
    for (i in 1:B) {
      rand.strata <- sample(strata)
      table.cts <- table(categorical, treatment, rand.strata)
      sum.strata.01 <- NULL
      for (j in 1:nstrat) {
        sum.strata.01 <- rbind(sum.strata.01, apply(table.cts[, , j], 2, sum))
      }

      prop.bt <- NULL
      for (j in 1:nstrat) {
        # prop.i is a vector of the proportion of 0/1 by categorical level
        # level for stratum j
        prop.0 <- table.cts[, 1, j] / sum.strata.01[j, 1]
        prop.1 <- table.cts[, 2, j] / sum.strata.01[j, 2]
        prop.bt <-
          sum(prop.bt, sum(abs((prop.0 + .02) ^ .5 - (prop.1 + .02) ^ .5)))
      }
      sum.abs.diff <- c(sum.abs.diff, prop.bt)
    }
    rnk <- rank(c(sum.abs.diff.original, sum.abs.diff))[1]

    hist(
      c(sum.abs.diff.original, sum.abs.diff),
      xlab = paste("Balance of", B, "Randomly Generated Strata"),
      main = main,
      ...
    )
    points(
      sum.abs.diff.original,
      0,
      col = "red",
      cex = 4,
      pch = 20
    )
    legend(
      x = "topleft",
      legend = list(
        paste("Original Balance:", round(sum.abs.diff.original, 2)),
        paste("Rank of Original:", round(rnk, 2), "of", B + 1)
      ),
      pch = c(19, 19),
      col = c(2, 0),
      bty = "n"
    )
    out <- list(balance.orig = sum.abs.diff.original, rank.orig = rnk)
    return(out)
  }

Try the PSAgraphics package in your browser

Any scripts or data that you put into this service are public.

PSAgraphics documentation built on March 31, 2023, 5:30 p.m.