hdps_screen: hdps_screen
In lendle/hdps: High-dimensional propensity score algorithm

Description Usage Arguments Details Value Author(s) References See Also Examples

The hdps_screen function performs part of step 2 (identify_covariates), steps 3 (assess_recurrence) and 4 (prioritize_covariates) of the HDPS algorithm (Schneeweiss et al., 2009).

1
2
3

hdps_screen(outcome, treatment, covars, dimension_names = NULL,
  dimension_indexes = NULL, keep_n_per_dimension = 200,
  keep_k_total = 500, verbose = FALSE, debug = FALSE)

`outcome`	binary vector of outcomes
`treatment`	binary vector of treatments
`covars`	`matrix` or `data.frame` of binary covariates.
`dimension_names`	A character vector of patterns to match against the column names of `covars` to split columns into dimension groups. See details.
`dimension_indexes`	A list of vectors of column indexes corresponding to dimension groups. See details. Cannot be specified with `dimension_names`.
`keep_n_per_dimension`	The maximum number of covariates to be kept per dimension by `identify_covariates`.
`keep_k_total`	Total number of covariates to keep after expanding by `assess_recurrence` and ordering by `link{prioritize_covariates}`.
`verbose`	Should verbose output be printed?
`debug`	Enables some debuging checks which slow things down, but may yield useful warnings or errors.

The hdps_screen function performs part of step 2 (identify_covariates), steps 3 (assess_recurrence) and 4 (prioritize_covariates) of the HDPS algorithm (Schneeweiss et al., 2009).

Step 2. Columns of covars are split by data dimension (as defined in Schneeweiss et al. (2009)) and filtered by identify_covariates.

Dimensions can be specified in two ways. If dimension_names is used, the colnames(covars) is greped for each value of dimension_names. If some column names match more than one pattern, an error is thrown. If some column names are not matched by any pattern, a warning is issued and those columns are ignored. For example, suppose the column names of covars are c("drug_1", "drug_2", "proc_1", "proc_2"). dimension_names <- c("drug", "proc") would split covars into two dimensions, one for drugs and one for procs.

Dimensions can also be specified by dimension_indexes which should contain a list of either column indexes or column names for each dimension.

If neither dimension_names nor dimension_indexes is specified, all covariates are treated as one dimension.

Step 3. After filtering, remaining covariates are expanded by assess_recurrence.

If at this point, the number of expanded covariates is less than keep_k_total, all expanded covariates are returned.

Step 4. Expanded covariates are ordered with prioritize_covariates.

Step 5. Step 5 can be performed with predict.hdps_covars.

An object of class hdps_covars

Sam Lendle

Schneeweiss, S., Rassen, J. A., Glynn, R. J., Avorn, J., Mogun, H., & Brookhart, M. A. (2009). High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology (Cambridge, Mass.), 20(4), 512.

predict.hdps_covars

set.seed(123)
n <- 1000
p <- 10000
out <- rbinom(n, 1, 0.05)
trt <- rbinom(n, 1, 0.5)
covars <- matrix(rbinom(n*p, 3, 0.05), n)
colnames(covars) <- c(paste("drug", 1:(p/2), sep="_"),
                      paste("proc", 1:(p/2), sep="_"))

dimension_names <- c("drug", "proc")

screened_covars_fit <- hdps_screen(out, trt, covars, 
                                   dimension_names = dimension_names,
                                   keep_n_per_dimension = 400,
                                   keep_k_total = 200,
                                   verbose=TRUE)
                                   
screened_covars <- predict(screened_covars_fit)