ps_pond: weihting on propensity score

View source: R/ps_pond.R

ps_pondR Documentation

weihting on propensity score

Description

Implement the weighting on propensity score with Matching Weights (MW) or the Inverse Probability of Treatment Weighting (IPTW) for all the drug exposures of the input drug matrix x which have more than a given number of co-occurence with the outcome. The binary outcome is regressed on a drug exposure through a classical weighted regression, for each drug exposure considered after filtering. With this approach, a p-value is obtained for each drug and a variable selection is performed over the corrected for multiple comparisons p-values.

Usage

ps_pond(
  x,
  y,
  n_min = 3,
  betaPos = TRUE,
  weights_type = c("mw", "iptw"),
  truncation = FALSE,
  q = 0.025,
  est_type = "bic",
  threshold = 0.05,
  ncore = 1
)

Arguments

x

Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix).

y

Binary response variable, numeric.

n_min

Numeric, Minimal number of co-occurence between a drug covariate and the outcome y to estimate its score. See details belows. Default is 3.

betaPos

Should the covariates selected by the procedure be positively associated with the outcome ? Default is TRUE.

weights_type

Character. Indicates which type of weighting is implemented. Could be either "mw" or "iptw".

truncation

Bouleen, should we do weight truncation? Default is FALSE.

q

If truncation is TRUE, quantile value for weight truncation. Ignored if truncation is FALSE. Default is 2.5 \%.

est_type

Character, indicates which approach is used to estimate the propensity score. Could be either "bic", "hdps" or "xgb". Default is "bic".

threshold

Threshold for the p-values. Default is 0.05.

ncore

The number of calcul units used for parallel computing. Default is 1, no parallelization is implemented.

Details

The MW are defined by

mw_i = min(PS_i, 1-PS_i)/[(expo_i) * PS_i + (1-expo_i) * (1-PS_i) ]

and weights from IPTW by

iptw_i = expo_i/PS_i + (1-expo_i)/(1-PS_i)

where expo_i is the drug exposure indicator. The PS could be estimated in different ways: using lasso-bic approach, the hdps algorithm or gradient tree boosting. The scores are estimated using the default parameter values of est_ps_bic, est_ps_hdps and est_ps_xgb functions (see documentation for details). We apply the same filter and the same multiple testing correction as in the paper UPCOMING REFERENCE: first, PS are estimated only for drug covariates which have more than n_min co-occurence with the outcome y. Adjustment on the PS is performed for these covariates and one sided or two-sided (depend on betaPos parameter) p-values are obtained. The p-values of the covariates not retained after filtering are set to 1. All these p-values are then adjusted for multiple comparaison with the Benjamini-Yekutieli correction. COULD BE VERY LONG. Since this approach (i) estimate a score for several drug covariates and (ii) perform an adjustment on these scores, parallelization is highly recommanded.

Value

An object with S3 class "ps", "*" ,"**" , where "*" is "mw" or "iptw", same as the input parameter weights_type, and "**" is "bic", "hdps" or "xgb" according on how the score was estimated.

estimates

Regression coefficients associated with the drug covariates. Numeric, length equal to the number of selected variables with this approach. Some elements could be NA if (i) the corresponding covariate was filtered out, (ii) weigted regression did not converge. Trying to estimate the score in a different way could help, but it's not insured.

corrected_pvals

One sided p-values if betaPos = TRUE, two-sided p-values if betaPos = FALSE adjusted for multiple testing. Numeric, length equal to nvars.

selected_variables

Character vector, names of variable(s) selected with the weighting on PS based approach. If betaPos = TRUE, this set is the covariates with a corrected one-sided p-value lower than threshold. Else this set is the covariates with a corrected two-sided p-value lower than threshold. Covariates are ordering according to their corrected p-value.

Author(s)

Emeline Courtois
Maintainer: Emeline Courtois emeline.courtois@inserm.fr

References

Benjamini, Y., & Yekuteli, D. (2001). "The Control of the False Discovery Rate in Multiple Testing under Dependency". The Annals of Statistics. 29(4), 1165–1188, doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/aos/1013699998")}.

Examples


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
pondps <- ps_pond(x = drugs, y = ae, n_min = 10, weights_type = "iptw")



adapt4pv documentation built on May 31, 2023, 6:08 p.m.