ps: Propensity scores

Description Usage Arguments Value References Examples

View source: R/ps.R

Description

Probabilities of being treated and not being treated for all subjects and for subgroups A=1 and A=0. Estimates could be computing by using two choices of GLM and/or SuperLearner.

Usage

1
ps(A, W, gform1 = NULL, gform2 = NULL, SL.library1 = NULL, SL.library2 = NULL, gbound = 0, verbose = TRUE, remove_first_dummy = FALSE, remove_most_frequent_dummy = FALSE)

Arguments

A

binary treatment indicator, 1 - treatment, 0 - control

W

vector, matrix, or dataframe containing baseline covariates

gform1

optional glm regression formula 1 of g

gform2

optional glm regression formula 2 of g

SL.library1

vector of prediction algorithms 1 for data adaptive estimation of g

SL.library2

vector of prediction algorithms 2 for data adaptive estimation of g

gbound

bounds used to truncate g with value between (0,1) for truncation of predicted probabilities; default value is 0

verbose

print the fit summary of GLM or SL if it is TRUE (default=TRUE)

remove_first_dummy

for categorical covariates, if true remove the first dummy of each covariate such that only n-1 dummies remain. This avoids multicollinearity issues in models(default =FALSE)

remove_most_frequent_dummy

for categorical covariates, if true remove the most frequently observed category such that only n-1 dummies remain. If there is a tie for most frequent, will remove the first (by alphabetical order) category that is tied for most frequent.(default =FALSE)

Value

probabilities

a dataframe with columns refer to Min., 1st Qu., Median, Mean, 3rd Qu., Max values and rows refer to P(A=1|W) and P(A=0|W) for all subjects, P(A=1|W) for subgroups A=1 and P(A=0|W) subgroups with A=0

fit_summaries

summaries of glm regression or SuperLearner models

References

1. Bahamyirou A, Blais L, Forget A, Schnitzer ME. (2019), Understanding and diagnosing the potential for bias when using machine learning methods with doubly robust causal estimators. Statistical methods in medical research, 28(6), 1637-50.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Example
# Continuous outcome
# User-supplied regression formulas to estimate g

  set.seed(1250)
  n=1000
  sigma <- matrix(c(2, 1, 1, 1), ncol=2)
  X <- matrix(rnorm(n*2), ncol = nrow(sigma)) 
  X <- X + matrix(rep(c(.5, 1),each = n), byrow = FALSE, ncol = 2)
  I1 <- rnorm(n,mean = 1, sd = 2)
  I2 <- rnorm(n,mean = 1, sd = 1.9)
  P1 <- rnorm(n,mean = 1, sd = 1.5)
  W <- data.frame(X, I1, I2, P1)
  colnames(W) <- c("W1", "W2", "I1", "I2",  "P1")
  A <- rbinom(n, 1, plogis(0.2+ W[,"W1"]+0.3*W[,"I1"]+W[,"W1"]*W[,"I1"]-0.2*(W[,"W2"]+W[,"I2"])^2))

  ps1 <- ps(A,W,gform1 = "A~W2+W1+I1+I2",gform2 = "A~W1+I1+I2+W1*W2",
  SL.library1 ="SL.glmnet",gbound=0)

# or

  ps2 <- ps(A,W,gform1 = "A~W2+W1+I1+I2",SL.library1 ="SL.glmnet",
  SL.library2 = "SL.gam",gbound = 0, verbose = FALSE)

Yan2020729/bdt1 documentation built on March 24, 2021, 8:58 p.m.