Centaur is an R package for the implementation of observational cohort studies. It provides several alternative workflows to control for observed confounding by providing a set of configurable options to compute propensity scores, balance covariates between two exposure groups, evaluate the quality of balance and perform outcome analysis.

Centaur also uses a more traditional approach than the OHDSI cohort method and places the responsibility on the user to include all observed covariates likely to have an impact on the treatment choice or the outcome. The cohort data is currently loaded by a user as an R dataframe and can come from any source. It is therefore left up to the user to design the cohort appropriately. In contrast, the OHDSI cohort method creates the cohort dataset by a direct configurable query to a CDM instance and then includes all possible exposures, conditions etc as covariates by default.

At the same time, the OHDSI Cohort Method can also be directly called within Centaur to facilitate using regularized regression for cohorts with large numbers of covariates and to compare results using different methods to calculate a propensity score.

- Load any pre-existing cohort file as an R data frame
- Propensity score calculation via logistic regression, generalized boosted models (gbm) which treats interaction terms or regularized regression as provided in the OHDSI cohort method
- Coefficients of Propensity score model (for logistic regression)
- Propensity Score Trimming
- Balancing via weighting (either SMR or IPTW), matching, or stratification
- Truncation of weights
- Standard balance diagnostics in tabular and graphical form
- Outcome analysis – either odds ratio or hazards ratio.
- Recommended default options for many steps, but also highly configurable settings

- load data via sql query of cdm instance (i.e. replicate
- options to run full workflow with pre-specified settings
- improved plotting

The default available methods are determined by the number of covariates in the dataset, and the total number of subjects. These limits have largely been determined empirically based on performance. Depending on your available hardware, it may be feasible to use a given method with more (or fewer) covariates and/or subjects. Each of these limits can be overridden.

Simple visual inspection of the area of common support.

"Violin" plots show the distribution of matched and unmatched control and treatment propensity scores.

Using the stratification approach, compare the distribution in the treatment/control groups of a single covariate in each strata.

?? R package ??

System requirements are highly dependent on the size of the dataset being analyzed. For any "real-world" dataset, we recommend at least a core i7 (or equivalent) and at least 8GB RAM.

- AUC
- broom
- data.table
- dplyr
- ff
- gtools
- Hmisc
- MASS
- MatchIt
- plyr
- RJDBC
- SDMTools
- sm
- survival
- twang
- vioplot

(list of packages)

- On Windows, make sure RTools is installed.
- In R, use the following commands to download and install Centaur:

```
r
install.packages("devtools")
library(devtools)
install_github("ohdsi/Centaur")
```

Read the whitepaper and Try the vignette! (Coming soon!)

- Centaur Manual April-2017
- An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies, AUSTIN 2011
- [Variable Selection for Propensity Score Models](Matching Methods for Causal Inference: A Review and a Look Forward STUART 2010](https://projecteuclid.org/euclid.ss/1280841730)
- (https://academic.oup.com/aje/article-lookup/doi/10.1093/aje/kwj149)
- Propensity score estimation with missing values using a multiple imputation missingness pattern(MIMP) approach LIPKOVICH 2009
- Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering From Missing Data, HILL 2004
- A comparison of 12 algorithms for matching on the propensity score, AUSTIN, 2014
- Weight Trimming and Propensity Score Weighting, STUART, 2011
- Reducing Bias in Observational Studies Using Subclassification on the Propensity Score RUBIN 1984
- Using Propensity Scores to Help Design Observational Sutdies: Application to the Tobacco Litigation RUBIN, 2001
- The performance of different propensity score methods for estimating marginal hazard ratios AUSTIN 2011
- MatchIt: Nonparametric preprocessing for parametric causal inference HO 2013
- Toolkit for weighting and analysis of nonequivalent groups: a tutorial for the
**twang**package - The performance of different propensity score methods for estimating marginal odds ratios. AUSTIN 4 2007
- A Step-by-Step Guide to Propensity Score Matching in R. BALLOUN 2014
- A Practical Guide for Using Propensity Score Weighting in R GOVINDASAMY 2015

The authors acknowledge the following team from AstraZeneca pharmaceuticals, Robert LoCasale, Michael Goodman, Ramin Arani, Yiduo Zhang, and Sudeep Karve for contributing to the requirements with their expertise in epidemiology, safety informatics, health economics and biostatistics and for reviewing the final product. The authors also acknowledge Jonathan Herz and Pramod Kumar for help with testing early versions of the package.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.