knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The propensity score (PS) is the conditional probability of assignment to a particular treatment given a vector of observed covariates [@rosenbaum_1983]. @hirano_2004 extended the idea to studies with continuous treatment (or exposure) and labeled it as the generalized propensity score (GPS), which is a probability density function. In this package, we use either a parametric model (a standard linear regression model) or a non-parametric model (a flexible machine learning model) to train the GPS model as a density estimation procedure [@kennedy_2017]. After the model training, we can estimate GPS values based on the model prediction. The machine learning models are developed using the SuperLearner Package [@superlearner_2007]. For more details on the problem framework and assumptions, please see @wu_2018.
Whether the prediction models' performance should be considered the primary parameter in the training of the prediction model is an open research question. In this package, the users have complete control over the hyperparameters, which can fine-tune the prediction models to achieve different performance levels.
The users can use any library in the SuperLearner package. However, in order to have control on internal libraries we generate customized wrappers. The following table represents the available customized wrappers as well as hyperparameters.
| Package name | sl_lib
name | prefix| available hyperparameters |
|:------------:|:-------------:|:-----:|:-------------------------:|
| XGBoost| m_xgboost
| xgb_
| nrounds, eta, max_depth, min_child_weight, verbose |
| ranger |m_ranger
| rgr_
| num.trees, write.forest, replace, verbose, family |
Both XGBoost
and ranger
libraries are developed for efficient processing on
multiple cores. The only requirement is making sure that OpenMP is installed on
the system. User needs to pass the number of threads (nthread
) in running the
estimate_gps
function.
In the following section, we conduct several analyses to test the scalability and performance. These analyses can be used to have a rough estimate of what to expect in different data sizes and computational resources.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.