R/SumStat_cl.R
In PSweight: Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

Documented in SumStat_cl

#' Calculate summary statistics for propensity score weighting with clustering (for binary treatment only)
#'
#' \code{SumStat_cl} is used to generate distributional plots of the estimated propensity scores and balance
#' diagnostics after propensity score weighting with two-level data.
#'
#' @param ps.formula an object of class \code{\link{formula}} (or one that can be coerced to that class): a symbolic description of the propensity score model to be fitted. Additional details of model specification are given under "Details".
#' @param trtgrp an optional character defining the "treated" population for estimating the average treatment effect among the treated (ATT). Only necessary if \code{weight = "treated"}. This option can also be used to specify the treatment (in a two-treatment setting). Default value is the last group in the alphebatic order.
#' @param data an data frame containing the variables in the propensity score model. If not found in data, the variables are taken from \code{environment(formula)}.
#' @param weight a character or vector of characters including the types of weights to be used. \code{"IPW"} specifies the inverse probability weights for estimating the average treatment effect among the combined population (ATE). \code{"treated"} specifies the weights for estimating the average treatment effect among the treated (ATT). \code{"overlap"} specifies the (generalized) overlap weights for estimating the average treatment effect among the overlap population (ATO), or population at clinical equipoise. \code{"matching"} specifies the matching weights for estimating the average treatment effect among the matched population (ATM). \code{"entropy"} specifies the entropy weights for the average treatment effect of entropy weighted population (ATEN). Default is \code{"overlap"}.
#' @param delta trimming threshold for estimated (generalized) propensity scores. Should be no larger than 1 / number of treatment groups. Default is 0, corresponding to no trimming.
#' @param nAGQ integer scalar - the number of points per axis for evaluating the adaptive Gauss-Hermite approximation to the log-likelihood.
#' Defaults to 1, corresponding to the Laplace approximation. Please refer to lme4 package for more details.
#' @details A typical form for \code{ps.formula} is \code{treatment ~ terms+1|clusters} where \code{treatment} is the treatment
#' variable, \code{terms} is a series of terms
#' which specifies a linear predictor for \code{treatment}, and \code{clusters} is the cluster indicator. The current version supports two-level models and the random-effects term is required to be the last piece in the formula. \code{ps.formula} specifies a mixed-effects logistic regression
#' model for estimating propensity scores. The treatment group corresponds to the last group in the alphebatic order, unless otherwise specified by \code{trtgrp}.
#'
#' Current version of \code{PSweight} allows for five types of propensity score weights used to estimate ATE (\code{"IPW"}), ATT (\code{"treated"}), and
#' ATO(\code{"overlap"}), ATM (\code{"matching"}) and ATEN (\code{"entropy"}). These weights are members of a larger class of balancing weights defined in Li, Morgan, and Zaslavsky (2018).
#' When there is a practical violation of the positivity assumption, \code{delta} defines the symmetric
#' propensity score trimming rule following Crump et al. (2009). With multiple treatments, \code{delta} defines the
#' multinomial trimming rule introduced in Yoshida et al. (2019). The overlap weights can also be considered as
#' a data-driven continuous trimming strategy without specifying trimming rules, see Li, Thomas and Li (2019).
#' Additional details on balancing weights and generalized overlap weights for multiple treatment groups are provided in
#' Li and Li (2019). For details about matching weights and entropy weights, please refer to Li and Greene (2013) and Zhou, Matsouaka and Thomas (2020).
#'
#' @return SumStat_cl returns a \code{SumStat} object including a list of the following value:
#' treatment group, propensity scores, fitted propensity model, propensity score weights, effective sample sizes,
#' and balance statistics. A summary of \code{SumStat} can be obtained with \code{\link{summary.SumStat}}.
#'
#' \describe{
#' \item{\code{ trtgrp}}{a character indicating the treatment group.}
#'
#' \item{\code{ propensity}}{a data frame of estimated propensity scores.}
#' 
#' \item{\code{ ps.fitObjects}}{the fitted propensity model details}
#'
#' \item{\code{ ps.weights}}{a data frame of propensity score weights.}
#'
#' \item{\code{ ess}}{a table of effective sample sizes. This serves as a conservative measure to
#' characterize the variance inflation or precision loss due to weighting, see Li and Li (2019).}
#'
#' \item{\code{ unweighted.sumstat}}{A list of tables including covariate means and variances
#' by treatment group and standardized mean differences.}
#'
#' \item{\code{ ATE.sumstat}}{If \code{"IPW"} is included in \code{weight}, this is a list of summary statistics using inverse probability weighting.}
#'
#' \item{\code{ ATT.sumstat}}{If \code{"treated"} is included in \code{weight}, this is a list of summary statistics using the ATT weights.}
#'
#' \item{\code{ ATO.sumstat}}{If \code{"overlap"} is included in \code{weight}, this is a list of summary statistics using the overlap weights.}
#'
#' \item{\code{ ATM.sumstat}}{If \code{"matching"} is included in \code{weight}, this is a list of summary statistics using the matching weights.}
#'
#' \item{\code{ ATEN.sumstat}}{If \code{"entropy"} is included in \code{weight}, this is a list of summary statistics using the entropy weights.}
#'
#' \item{\code{ trim}}{If \code{delta > 0}, this is a table summarizing the number of observations before and after trimming.}
#'
#' }
#'
#' @references
#' Crump, R. K., Hotz, V. J., Imbens, G. W., Mitnik, O. A. (2009).
#' Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96(1), 187-199.
#'
#' Li, L., Greene, T. (2013).
#' A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics, 9(2), 215-234.
#'
#' Li, F., Morgan, K. L., Zaslavsky, A. M. (2018).
#' Balancing covariates via propensity score weighting.
#' Journal of the American Statistical Association, 113(521), 390-400.
#'
#' Li, F., Thomas, L. E., Li, F. (2019).
#' Addressing extreme propensity scores via the overlap weights. American Journal of Epidemiology, 188(1), 250-257.
#'
#' Li, F., Li, F. (2019). Propensity score weighting for causal inference with multiple treatments.
#' The Annals of Applied Statistics, 13(4), 2389-2415.
#'
#' Zhou, Y., Matsouaka, R. A., Thomas, L. (2020).
#' Propensity score weighting under limited overlap and model misspecification. Statistical Methods in Medical Research 29(12), 3721-3756.
#'
#' Li, F., Zaslavsky, A. M., & Landrum, M. B. (2013). Propensity score weighting with multilevel data. Statistics in Medicine, 32(19), 3373-3387.
#'
#' @export
#'
#' @examples
#' data("psdata_cl")
#' # the propensity model
#' ps.formula<-trt~cov1+cov2+cov3+cov4+cov5+cov6+(1|clt)
#' 
#' # using SumStat to estimate propensity scores
#' msstat <- SumStat_cl(ps.formula, trtgrp="1", data=psdata_cl,
#'   weight=c("IPW","overlap","treated","entropy","matching"))
#' summary(msstat)
#' 
#' @import nnet SuperLearner gbm lme4
#' @importFrom  stats binomial coef cov formula glm lm model.matrix plogis poisson predict qnorm quantile sd
#' @importFrom  utils capture.output combn
#' @importFrom  graphics hist legend
#'
#'
SumStat_cl<- function(ps.formula=NULL,trtgrp=NULL,data=NULL,weight="overlap",delta=0,nAGQ=1L){

    #extract z name
    zname<-all.vars(ps.formula)[1]
    xname<-all.vars(ps.formula)[-c(1,length(all.vars(ps.formula)))]

    #set ordered group
    facz<-as.factor(unlist(data[,zname]))
    data[,zname]<-facz

    #treatment label
    #creat a dictionary for the original and recoded values in Z
    dic<-levels(facz)
    ncate<-length(dic) #number of categories
    if (!is.null(trtgrp)) trt<-which(dic==trtgrp)

    if (ncate==1) stop("Treatment variable needs to have more than 1 category.","\n")


    #trim the data
    if(delta>0){
      fit.e<-lme4::glmer(formula = ps.formula, data=data, nAGQ=nAGQ,family="binomial")
      e.h <- as.numeric(predict(fit.e,newdata=data,type="response"))
      ps.estimate<-cbind(1-e.h,e.h)
      did<-which(ps.estimate[,1]<delta | ps.estimate[,1]> (1-delta))
      datatmp<-data
      data<-data[-did,]
      trimmed<-table(datatmp[zname])-table(data[zname])
      remained<-table(data[zname])
      trim_sum<-rbind(trimmed,remained)
    }

    #fit propensity score model
    fit.e <-lme4::glmer(formula = ps.formula, data=data, nAGQ=nAGQ,family="binomial")
    e.h <- as.numeric(predict(fit.e,newdata=data,type="response"))
    e.h <- cbind((1-e.h),e.h)
    #post-trimming processing
    z<-as.numeric(data[,zname])
    n<-length(z) #total obs
    data["zindex"]<-z
    covM<-data[,xname]

    #transform factor/categorical column, if any
    cat_index<-c()
    for(i in 1:ncol(covM)){
      if(inherits(covM[,i],"factor")) {
        if(length(table(covM[,i]))==1){
          stop("invariate covariate supplied ","\n")
        }else{
          covmtmp<-covM[,i]
          lev<-unique(as.character(as.factor(covmtmp)))
          cols <- paste0(names(covM)[i], lev)
          covM[cols] <-0
          ncols=length(cols)
          for(j in 1:ncols){
            covM[cols[j]]<-1.*(covmtmp==lev[j])
          }
          cat_index<-c(cat_index,i)
        }
        #stop("Non-numeric values exist in covariates, please transform","\n")
      }
    }

    if(!is.null(cat_index)){
      covM<-covM[,-cat_index]
    }
    covM<-as.matrix(covM)


    #dimension of data
    n<-dim(data)[1]
    ncate<-length(unique(data[[zname]]))

    #correspondence btw Z and zindex
    dic<-rep(NA,ncate)
    for (i in 1:ncate) dic[i]<-as.character(unique(unlist(data[zname])[which(data$zindex==i)]))
    if (!is.null(trtgrp)) trt<-which(dic==trtgrp)


  #use weight_gen() to obtain weights according to user's specification
  weight_gen<-function(AT){
    if(AT =='overlap'){
      tilt.h<-(1/apply(1/e.h,1,sum))
      allwt<-(1/e.h)*tilt.h
      wt<-rep(0,n)
      wt1<-rep(0,n)
      for(i in 1:ncate){
        wt[z==i]<-allwt[z==i,i]
        wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])}
    }

    else if (AT == 'IPW'){
      tilt.h<-rep(1,n)
      allwt<-1/e.h
      wt<-rep(0,n)
      wt1<-rep(0,n)
      for(i in 1:ncate){
        wt[z==i]<-allwt[z==i,i]
        wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])}
    }

    else if (AT == 'matching'){
      tilt.h<-apply(e.h, 1, min)
      allwt<-tilt.h/e.h
      wt<-rep(0,n)
      wt1<-rep(0,n)
      for(i in 1:ncate){
        wt[z==i]<-allwt[z==i,i]
        wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])}
    }

    else if (AT == 'entropy'){
      e.hclip<- pmax(e.h,1e-6)
      tilt.h<-(-apply(e.hclip*log(e.hclip) ,1,sum))
      allwt<-tilt.h/e.hclip
      wt<-rep(0,n)
      wt1<-rep(0,n)
      for(i in 1:ncate){
        wt[z==i]<-allwt[z==i,i]
        wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])}
    }

    else if (AT == 'treated'){
      if (is.null(trtgrp)){
        tilt.h<-e.h[,ncate]
        allwt<-tilt.h/e.h
        wt<-rep(0,n)
        wt1<-rep(0,n)
        for(i in 1:ncate){
          wt[z==i]<-allwt[z==i,i]
          wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])
        }
      }
      else{
        if (!trtgrp%in%dic) {
          warning("trtgrp not found in the treatment variable, argument ignored")
          trt<-ncate
        }
        tilt.h<-e.h[,trt]
        allwt<-tilt.h/e.h
        wt<-rep(0,n)
        wt1<-rep(0,n)
        for(i in 1:ncate){
          wt[z==i]<-allwt[z==i,i]
          wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])
        }
      }
    }

    else if (AT=="none"){
      tilt.h<-w<-rep(1,n)
      allwt<-data.frame(matrix(rep(1,ncate*n),ncol=ncate,nrow=n))
      wt<-rep(0,n)
      wt1<-rep(0,n)
      for(i in 1:ncate){
        wt[z==i]<-allwt[z==i,i]
        wt1[z==i]<-allwt[z==i,i]/sum(allwt[z==i,i])}
    }else{
      stop("weight option not recognized","\n")
    }

    return(data.frame(wt,wt1,tilt.h))
  }

  #use wstat() function to calculate summary statistics--asd
  wstat<-function(covM,z,w,h){
    mres<-vres<-vres0<-NULL
    ncate<-length(table(z))
    hstd<-h/sum(h)

    pres<-as.numeric(colSums(covM*hstd))
    for(k in 1:ncate){
      nk<-sum(z==k)
      covMk<-covM[z==k,,drop=F]
      wk<-w[z==k]
      mk<-as.numeric(colSums(covMk*wk))
      mres<-cbind(mres,mk)
      vks<-as.numeric(colSums((covMk-rep(1,nk)%*%t(mk))^2*wk))
      vk<-sqrt(vks/(1-sum(wk^2)))
      vres<-cbind(vres,vk)
      vk0<-as.numeric(apply(covMk,2,sd))
      vres0<-cbind(vres0,vk0)
    }
    colnames(mres)<-paste0("Mean ",dic[1:ncate])
    colnames(vres)<-paste0("Weighted SD ",dic[1:ncate])
    colnames(vres0)<-paste0("Unweighted SD ", dic[1:ncate])
    rownames(mres)<-rownames(vres)<-rownames(vres0)<-colnames(covM)

    sx2<-1/ncate*rowSums(vres^2)
    sx20<-1/ncate*rowSums(vres0^2)

    #PSD
    psd<-(mres-pres)/sqrt(sx2)
    psd0<-(mres-pres)/sqrt(sx20)
    rownames(psd)<-rownames(psd0)<-colnames(covM)
    colnames(psd)<-paste0("PSD weighted var ",dic[1:ncate])
    colnames(psd0)<-paste0("PSD unweighted var ",dic[1:ncate])

    #ASD
    G<-combn(1:ncate,2)
    asd<-matrix(NA,ncol=dim(G)[2],nrow=dim(covM)[2])
    asd0<-matrix(NA,ncol=dim(G)[2],nrow=dim(covM)[2])
    rownames(asd)<-rownames(asd0)<-colnames(covM)
    colnames(asd)<-paste0("ASD weighted var ", apply(G,2,paste,collapse="-"))
    colnames(asd0)<-paste0("ASD unweighted var ", apply(G,2,paste,collapse="-"))


    for (g in 1:dim(G)[2]){
      asd[,g]<-(mres[,G[,g],drop=F][,1]-mres[,G[,g],drop=F][,2])/sqrt(sx2)
      asd0[,g]<-(mres[,G[,g],drop=F][,1]-mres[,G[,g],drop=F][,2])/sqrt(sx20)
    }

    #return(cbind(mres=mres,vres=vres,vres0=vres0,ASD.weighted.var=asd,
    #            ASD.unweighted.var=asd0,PSD.weighted.var=psd,PSD.unweighted.var=psd0))
    return(cbind(mres,vres,vres0,asd,asd0,psd,psd0))

  }

  #Matrix for effective sample size
  eff.sample.size<-matrix(rep(NA,ncate*(length(weight)+1)),nrow=ncate,ncol=(length(weight)+1))
  colnames(eff.sample.size)<-c("unweighted",weight)
  rownames(eff.sample.size)<-dic
  eff.sample.size[,1]<-as.numeric(table(data$zindex))

  uw<-weight_gen(AT="none")
  unweighted<-wstat(covM,z=data$zindex,w=uw[,2],h=uw[,3])

  ps.weights<-data.frame(Z=data[zname],zindex=data$zindex)
  #add all weight types and output weights and effective sample sizes
  for (i in 1:length(weight)){
    ATweights<-weight_gen(weight[i])
    ps.weights[weight[i]]<-ATweights[,2]

    sum_stat_AT<- wstat(covM,data$zindex,ATweights[,2],ATweights[,3])
    assign(weight[i], sum_stat_AT, envir = as.environment(-1))

    #add effective sample sizes
    for (j in 1:ncate){
      eff.sample.size[j,i+1]<-sum(ATweights[,1]*(data$zindex==j))^2/sum((ATweights[,1]*(data$zindex==j))^2)
    }
  }

  #output
  if (is.null(trtgrp)) trtgrp=dic[ncate]
  output<-list(trtgrp=trtgrp, propensity=e.h, ps.fitObjects = fit.e ,ps.weights=ps.weights, ess=eff.sample.size, unweighted.sumstat=unweighted)

  for (i in 1:(length(weight))){
    output[[paste0(weight[i],".sumstat")]]<-get(weight[i])
  }

  if (delta>0) output[["trim"]]<-trim_sum

  class(output)<-'SumStat'
  output
}
Any scripts or data that you put into this service are public.
PSweight documentation built on April 3, 2025, 10:27 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
PSweight
Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

R/SumStat_cl.R
In PSweight: Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

Defines functions SumStat_cl

Documented in SumStat_cl

Try the PSweight package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PSweight Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

R/SumStat_cl.R In PSweight: Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

Defines functions SumStat_cl

Documented in SumStat_cl

Try the PSweight package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

PSweight
Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials

R/SumStat_cl.R
In PSweight: Propensity Score Weighting for Causal Inference with Observational Studies and Randomized Trials