knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Pseudo population dataset is computed based on user-defined causal inference approaches (e.g., matching or weighting). A covariate balance test is performed on the pseudo population dataset. Users can specify covariate balance criteria and activate an adaptive approach and number of attempts to search for a target pseudo population dataset that meets the covariate balance criteria.

Usage

Input parameters:

Y a vector of observed outcome
w a vector of observed continues exposure
c data frame or matrix of observed baseline covariates
ci_appr The causal inference approach. Options are "matching", "weighting", and "adjusting".
matching_fun specified matching function
scale specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS estimates
delta_n specified caliper parameter on the exposure
covar_bl_method specified covariate balance method
covar_bl_trs specified covariate balance threshold
max_attempt maximum number of attempt to satisfy covariate balance

Technical Details for Matching

The matching algorithm aims to match an observed unit $j$ to each $j'$ at each exposure level $w^{(l)}$.

1) We specify delta_n ($\delta_n$), a caliper for any exposure level $w$, which constitutes equally sized bins, i.e., $[w-\delta_n, w+\delta_n]$. Based on the caliper delta_n , we define a predetermined set of $L$ exposure levels ${w^{(1)}=\min(w)+ \delta_n,w^{(2)}=\min(w)+3 \delta_n,...,w^{(L)} = \min(w)+(2L-1) \delta_n}$, where $L = \lfloor \frac{\max(w)-\min(w)}{2\delta_n} + \frac{1}{2} \rfloor$. Each exposure level $w^{(l)}$ is the midpoint of equally sized bins, $[w^{(l)}-\delta_n, w^{(l)}+\delta_n]$.

2) We implement a nested-loop algorithm, with $l$ in $1,2,\ldots, L$ as the outer-loop, and $j'$ in $1 ,\ldots,N$ as the inner-loop. The algorithm outputs the final product of our design stage, i.e., a matched set with $N\times L$ units. \ for $l = 1,2,\ldots, L$ do \   Choose one exposure level of interest $w^{(l)} \in {w^{(1)}, w^{(2)}, ..., w^{(L)}}$. \   for $j' = 1 ,\ldots,N$ do \ \setlength{\leftskip}{0pt}   2.1 Evaluate the GPS $\hat{e}(w^{(l)}, \mathbf{c}{j'})$ (for short $e^{(l)}{j'}$) at $w^{(l)}$ based on the fitted GPS model in Step 1 for each unit $j'$ having observed covariates $\mathbf{c}{j'}$. \   2.2 Implement the matching to find an observed unit -- denoted by $j$ -- that matched with $j'$ with respect to both the exposure $w{j}\approx w^{(l)}$ and the estimated GPS $\hat{e}(w_j, \mathbf{c}{j}) \approx e^{(l)}{j'}$ (under a standardized Euclidean transformation). More specifically, we find a $j$ as $$ j_{{gps}}(e^{(l)}{j'},w^{(l)})=\text{arg} \ \underset{j: w_j \in [w^{(l)}-\delta_n,w^{(l)}+\delta_n]}{\text{min}} \ \mid\mid( \lambda \hat{e}^{}(w_j,\mathbf{c}_j), (1-\lambda)w^{}_j) -(\lambda e{j'}^{(l)}, (1-\lambda) w^{(l)})\mid\mid, $$ where matching_fun ($||.||$) is a pre-specified two-dimensional metric, scale ($\lambda$) is the scale parameter assigning weights to the corresponding two dimensions (i.e., the GPS and exposure), and $\delta$ is the caliper defined in Step 2 allowing that only the unit $j$ with an observed exposure $w_j \in [w^{(l)}-\delta,w^{(l)}+\delta]$ can get matched. \   2.3 Impute $Y_{j'}(w^{(l)})$ as: $\hat{Y}{j'}(w^{(l)})=Y^{obs}{j_{{gps}}(e^{(l)}{j'},w^{(l)})}$. \   end for \begin{itemize}   Note: We allow multiple $j'$ (e.g., $j' =1$ and $j' = 5$) to be matched with the same observed unit $j$ throughout the inner-loop $j'$ in $1 ,\ldots,N$ (matching with replacement). \end{itemize} end for 3) After implementing the matching algorithm, we construct the matched set with $N\times L$ units by combining all $\hat{Y}{j'}(w^{(l)})$ for $j'=1,\ldots,N$ and for all $w^{(l)} \in {w^{(1)},w^{(2)},...,w^{(L)}}$.

Technical Details for Covariate Balance

We introduce the absolute correlation measure (covar_bl_method = "absolute") to assess covariate balance for continuous exposures . The absolute correlation between the exposure and each pre-exposure covariate is a global measure and can inform whether the whole matched set is balanced. The measures above build upon the work by [@austin2019assessing] who examine covariate balance conditions with continuous exposures. We adapt them into the proposed matching framework.

In a balanced pseudo population dataset, the correlations between the exposure and pre-exposure covariates should close to zero, that is $E [\mathbf{c}{i}^{} w_{i}^{} ] \approx \mathbf{0}.$ We calculate the absolute correlation in the pseudo population dataset as
\begin{align} \big\lvert \sum_{i=1}^{N\times L} \mathbf{c}_{i}^{} w
{i}^{} \big\lvert \end{align}

The average absolute correlations are defined as the average of absolute correlations among all covariates. Average absolute correlation: \begin{align} \overline{\big\lvert \sum_{i=1}^{N\times L} \mathbf{c}_{i}^{} w_{i}^{} \big\lvert} < \boldsymbol{\epsilon}_1. \end{align} We specify a pre-specified threshold covar_bl_trs ($\boldsymbol{\epsilon}_1$), for example 0.1, on average absolute correlation as the threshold for covariate balance in the pseudo population dataset.

References



wxwx1993/GPSmatching documentation built on March 1, 2023, 9:32 p.m.