Multilevel Propensity Score Analysis
In multilevelPSA: Multilevel Propensity Score Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(digits = 2)

Introduction

Given the large amount of data to be summarized, the use of graphics are an integral component of representing the results. Pruzek and Helmreich (2009) introduced a class of graphics for visualizing dependent sample tests (see also Pruzek & Helmreich, 2010; Danielak, Pruzek, Doane, Helmreich, & Bryer, 2011). This framework was then extended for propensity score methods using stratification (Helmreich & Pruzek, 2009). In particular, the representation of confidence intervals relative to the unit line (i.e. the line y=x) provided a new way of determining whether there is a statistically significant diﬀerence between two groups. The multilevelPSA package provides a number of graphing functions that extend these frameworks for multilevel PSA. The figure below represents a multilevel PSA assessment plot with annotations. This graphic represents the results of comparing private and public schools in North America using the Programme of International Student Assessment (PISA; Organisation for Economic Co-Operation and Development, 2009). The PISA data to create this graphic are included in the multilevelPSA package and a more detailed description of how to create this graphic are discussed in the next section. Additionally, the use of PISA makes more visible certain features of the graphics used. As discussed in chapters four and five, the diﬀerences between charter and traditional public schools is minimal and therefore some features of the figures are less apparent. The following section focuses on the features of this graphic.

In the figure below, the x-axis corresponds to math scores for private schools and the y-axis corresponds to public school maths cores. Each colored circle (a) is a country with its size corresponding to the number of students sampled within each country. Each country is projected to the lower left, parallel to the unit line, such that a tick mark is placed on the line with slope -1 (b). These tick marks represent the distribution of diﬀerences between private and public schools across countries. Diﬀerences are aggregated (and weighted by size) across countries. For math, the overall adjusted mean for private schools is 487, and the overall adjusted mean for public schools is 459 and represented by the horizontal (c) and vertical (d) blue lines, respectively. The dashed blue line parallel to the unit line (e) corresponds to the overall adjusted mean diﬀerence and likewise, the dashed green lines (f) correspond to the confidence interval. Lastly, rug plots along the right and top edges of the graphic (g) correspond to the distribution of each country’s overall mean private and public school math scores, respectively.

The figure represents a large amount of data and provides insight into the data and results. The figure provides overall results that would be present in a traditional table, for instance the fact that the green dashed lines do not span the unit line (i.e. y = x) indicates that there is a statistically significant diﬀerence between the two groups. However additional information is diﬃcult to convey in tabular format. For example, the rug plots indicate that the spread in the performance of both private and public schools across countries is large. Also observe that Canada, which has the largest PISA scores for both groups, also has the largest diﬀerence (in favor of private schools) as represented by the larger distance from the unit line.

knitr::include_graphics('AnnotatedCircPlot.png')

Working Example

The multilevelPSA package includes North American data from the Programme of International Student Assessment (PISA; Organisation for Economic Co-Operation and Development, 2009). This data is made freely available for research and is utilized here so that the R code is reproducible9. This example compares the performance of private and public schools clustered by country. Note that PISA provide five plausible values for the academic scores since students complete a subset of the total assessment. For simplicity, the math score used for analysis is the average of these five plausible scores.

library(multilevelPSA)
library(party)

data(pisana)
data(pisa.psa.cols)
pisana$MathScore <- apply(pisana[,paste0('PV', 1:5, 'MATH')], 1, sum) / 5

The mlpsa.ctree function performs phase I of the propensity score analysis using classification trees, specifically using the ctree function in the party package. The getStrata function returns a data frame with a number of rows equivalent to the original data frame indicating the stratum for each student.

mlpsa <- mlpsa.ctree(pisana[,c('CNT', 'PUBPRIV', pisa.psa.cols)], 
                     formula = PUBPRIV ~ ., 
                     level2 = 'CNT')
mlpsa.df <- getStrata(mlpsa, pisana, level2 = 'CNT')

Similarly, the mlpsa.logistic estimates propensity scores using logistic regression. The getPropensityScores function returns a data frame with a number of rows equivalent to the original data frame.

mlpsa.lr <- mlpsa.logistic(pisana[,c('CNT', 'PUBPRIV', pisa.psa.cols)], 
                           formula = PUBPRIV ~ ., 
                           level2 = 'CNT')
mlpsa.lr.df <- getPropensityScores(mlpsa.lr, nStrata = 5)

head(mlpsa.lr.df)

The covariate.balance function calculates balance statistics for each covariate by estimating the eﬀect of each covariate before and after adjustment. The results can be converted to a data frame to view numeric results or the plot function provides a balance plot. This figure depicts the eﬀect size of each covariate before (blue triangle) and after (red circle) propensity score adjustment. As shown here, the eﬀect size for nearly all covariates is smaller than the unadjusted eﬀect size. The few exceptions are for covariates where the unadjusted eﬀect size was already small. There is no established threshold for what is considered a suﬃciently small eﬀect size. In general, I recommend adjusted eﬀect sizes less than 0.1 which reflect less than 1% of variance explained.

cv.bal <- covariate.balance(covariates = pisana[,pisa.psa.cols],
                            treatment = pisana$PUBPRIV,
                            level2 = pisana$CNT,
                            strata = mlpsa.df$strata)
head(as.data.frame(cv.bal))
plot(cv.bal)

The mlpsa function performs phase II of propensity score analysis and requires four parameters: the response variable, treatment indicator, stratum, and clustering indicator. The minN parameter (which defaults to five) indicates what the minimum stratum size is to be included in the analysis. For this example, 463, or less than one percent of students were removed because the stratum (or leaf node for classification trees) did not contain at least five students from both the treatment and control groups.

results.psa.math <- mlpsa(response = mlpsa.df$MathScore,
                          treatment = mlpsa.df$PUBPRIV,
                          strata = mlpsa.df$strata,
                          level2 = mlpsa.df$CNT)

The summary function provides the overall treatment estimates as well as level one and two summaries.

summary(results.psa.math)

The plot function creates the multilevel assessment plot. Here it is depicted with side panels showing the distribution of math scores for all strata for public school students to the left and private school students below. These panels can be plotted separately using the mlpsa.circ.plot and mlpsa.distribution.plot functions.

plot(results.psa.math)

Lastly, the mlpsa.difference.plot function plots the overall diﬀerences. The sd parameter is optional, but if specified, the x-axis can be interpreted as standardized eﬀect sizes.

mlpsa.difference.plot(results.psa.math, 
                      sd = mean(mlpsa.df$MathScore, na.rm=TRUE))

Any scripts or data that you put into this service are public.

multilevelPSA documentation built on April 11, 2025, 6:18 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

multilevelPSA
Multilevel Propensity Score Analysis

Multilevel Propensity Score Analysis
In multilevelPSA: Multilevel Propensity Score Analysis

Introduction

Working Example

Try the multilevelPSA package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multilevelPSA Multilevel Propensity Score Analysis

Multilevel Propensity Score Analysis In multilevelPSA: Multilevel Propensity Score Analysis

Introduction

Working Example

Try the multilevelPSA package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

multilevelPSA
Multilevel Propensity Score Analysis

Multilevel Propensity Score Analysis
In multilevelPSA: Multilevel Propensity Score Analysis