knitr::opts_chunk$set(echo = TRUE, message=F, warning=F)

library(dplyr); library(tibble); library(finalpsm); library(patchwork)

Propensity-score Matching (PSM) with the finalpsm package

Create the matching dataset

Firstly we should ensure all variables of interest are in the correct formats for subsequent propensity-score matching. Variables should either be factors or numerical as appropriate.

We will be using the survival::colon dataset as the basis of our example.

data <- tibble::as_tibble(survival::colon) %>%

  dplyr::filter(etype==2) %>% # Outcome of interest is death
  dplyr::filter(rx!="Obs") %>%  # rx will be our binary treatment variable
  dplyr::select(-etype,-study, -status) %>% # Remove superfluous variables

  # Convert into numeric and factor variables
  dplyr::mutate_at(vars(obstruct, perfor, adhere, node4), function(x){factor(x, levels=c(0,1), labels = c("No", "Yes"))}) %>%
  dplyr::mutate(rx = factor(rx),
                mort365 = cut(time, breaks = c(-Inf, 365, Inf), labels = c("Yes", "No")),
                sex = factor(sex, levels=c(0,1), labels = c("Female", "Male")),
                differ = factor(differ, levels = c(1,2,3), labels = c("Well", "Moderate", "Poor")),
                extent = factor(extent, levels = c(1,2,3, 4), labels = c("Submucosa", "Muscle", "Serosa", "Contiguous Structures")),
                surg = factor(surg, levels = c(0,1), labels = c("Short", "Long"))) %>%

  # Logical value for outcome (for survival analysis)
  dplyr::mutate(status = mort365=="Yes")

knitr::kable(head(data, 10))

   

Propensity-score matching with matchit

With the data pre-processed, we can use the finalpsm matchit() function to create the PSM dataset. This is essentially a wrapper function to the MatchIt matchit() function - one of the most widely used packages for PSM within R. However, the finalpsm matchit() addresses what are felt to be several key issues:

 

 

   

There are 4 types of variables that can be specified in the finalpsm::match function:

All other inputs to the MatchIt::matchit() are accepted (see MatchIt documentation) with the default method of matching being full matching.

 

output <- data %>%
  finalpsm::match(strata = "rx",
                    explanatory = c("age","sex", "obstruct", "differ", "surg"),
                    id = "id",
                    dependent = c("mort365", "time"),
                    method = "full")

 

The outputs from the finalpsm::match() function include:

output$object
head(output$data, 10) %>% knitr::kable()

 

There are 3 additional columns appended to the dataset following propensity-score matching:

   

Assessment of Balance

So we have gone to the effort of matching the sample to allow inference. However, before we start using the matched dataset generated, we should ensure that the PSM process has achieved its goal of creating a balanced sample on our observed variables (as that is a key determinant of the validity of any conclusions based on this data).

This can be achieved in 2 ways: both visual and quantitative methods of assessing balance.

 

Balance Visualisation

There are several ways to visualise balance after PSM, which broadly fall into 2 categories: overall assessment and individual covariate assessment (explanatory).

The balance_plot() function provides the capability to easily generate several ggplots to allow quick assessment.

 

1. Overall balance between treatment groups

This can be visualised as a density or jitter plot to allow comparison of the overall distribution of propensity-scores between the treatment groups.

finalpsm::balance_plot(output, type = "density") / finalpsm::balance_plot(output, type = "jitter")

Well balanced groups should follow approximately the same pattern (as in this case).

 

2. Covariate balance between treatment groups

This can be visualised as a geom_point plot with a line of best fit to allow comparison of the distribution of propensity-scores between treatment groups for the different covariates (explanatory variables).

# https://sejdemyr.github.io/r-tutorials/statistics/tutorial8.html
covariate <- finalpsm::balance_plot(output, type = "covariate")

covariate$factor / covariate$numeric

Well balanced groups should follow approximately the same pattern (as in this case).

This can also be visualized as a Love plot graphically displaying covariate balance before and after adjusting.

# https://sejdemyr.github.io/r-tutorials/statistics/tutorial8.html
finalpsm::balance_plot(output, type = "love", threshold = 0.2)

From the Love plot here, we can see that while some variables have become less balanced as a result of the propensity score matching process, these remain overall within our stated threshold of "good" balance (0.2). The unbalanced variable (sex) has seen a substantial improvement in balance as a result of the PSM process.

 

Balance Table

However, people usually like some objective numbers thrown in (even if it's sometimes about as arbitrary as squinting at a plot).

There are several packages out there that include some measure of formal quantification of the balance in a PSM sample ("balance tables"), MatchIt included. However, these tend to be poorly formatted for readability, comparison and publication.

The cobalt package is far more sophisticated regarding assessment of balance than finalpsm, however does not work well within the intended tidyverse / finalfit workflow to produce tables formatted for publication.

finalpsm::balance_table(output, threshold = 0.2) %>% knitr::kable()

As you can see with the unm_balance and mat_balance columns, the sample is already relatively well balanced in the "unmatched" sample (unm_) with in imbalance in the sex variable (more females in the treated group). However, in the propensity-score matched sample, a much improved balance has been achieved across all variables (all now below the a priori absolute standardised mean difference of 0.2).

   



kamclean/finalpsm documentation built on Oct. 3, 2023, 3:52 a.m.