ps_match_strata: Propensity Score Matching of strata (two or more classes,...

Description Usage Arguments Value Note Examples

View source: R/ps_match_strata.R

Description

This function perfroms propensity score matching of two or more strata.
It could be used when arm1 has 2 or more strata, while strata information is unknown in arm2.
The function will fit a logistic regression (when 2 classes) or multinomial logistic regression (when > 2 classes) based on strata labels in arm1 (model: label~features), then predict strata labels in both arm1 and arm2 based on the fitted model. The predicted probability of being stratum X will be used for propensity score matching. The matching results will then be used to estimate treatment difference of two arms (Hazard ratio for survival endpoint; response rate difference for binary endpoint). When ties are allowed, weights from the ties will be used to calculate the HR or response rate difference.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
ps_match_strata(
  data.in,
  formula,
  indicator.var = "indicator",
  ties = TRUE,
  class.of.int = NULL,
  tte = "AVAL",
  event = "event",
  trt = "trt",
  response = NULL,
  caliper = NULL,
  model = "plain",
  weights = NULL,
  multinom.maxit = 100,
  return.data = TRUE
)

Arguments

data.in

(data.frame) input data

formula

(formula) to input to the logistic or multinomial logistic model (in the form of strata~features)

indicator.var

(string) column name of the strata indicator variable which must be numeric. Assume arm1 has strata labeling and arm2 does not have strata labeling. pts without strata labeling should be indicated as -1 (e.g. pts in the arm1, or pts in arm2 but with missing label). within arm1 (the arm with strata labeling), subclasss should be indicated as 0,1,2...

ties

(logical) TRUE allows for ties.

ties

is TRUE, all samples in the tie will be included. When calculating summary statistics, samples in ties will be assigned a smaller weight. for example, if two samples ties, these two samples will both be included in the summary statistics calculation with weight 0.5.

ties

is FALSE, one random sample will be draw from the tied samples when calculating summary statistics. In this case, it is recommended to run ps_match_strata multiple times with different seeds and take the average or median summary statistics from multiple runs. Note when ties is FALSE, codes were tested less thoroughly and extra caution may be needed.

class.of.int

(list) classes (stratum) of interest. Request to be in list format. It could be subset of classes in arm1; it could also define combined classes. For example: class.of.int = list("class1"=0, "class2"=1, "class3"=2, "class2or3"="c(1,2)"). for "class2or3", Prob(class 2 or 3) will be calculated as Prob(class2) + Prob(class3)

tte

(string) column name of the time to event variable

event

(string) column name of the event variable (1: event, 0: censor)

trt

(string) column name of the treatment variable. The variable is expected to be in factor format and the first level will be considered as reference (control) arm when calculating summary statistics.

response

(string) column name of the response variable. 1 indicates responder and 0 indicates non responder. if response is not NULL, tte and event will be ignored and the function will assume binary outcome.

caliper

A scalar or vector denoting the caliper(s) which should be used when matching. A caliper is the distance which is acceptable for any match. Observations which are outside of the caliper are dropped. If a scalar caliper is provided, this caliper is used for all covariates in X. If a vector of calipers is provided, a caliper value should be provided for each covariate in X. The caliper is interpreted to be in standardized units. For example, caliper=.25 means that all matches not equal to or within .25 standard deviations of each covariate in X are dropped. Note that dropping observations generally changes the quantity being estimated.

model

(string) one of (plain, dwc, wri).

"plain"

when 2 levels are specified in indicator variable, a binomial glm will be fitted; when more than 2 levels are specified, a multinomial glm will be fitted;

"dwc"

Doubly weighted control: Two separated models will be fitted: one is binomial glm of 2 vs. (1, 0), the other one is bionomial glm of 1 vs 0. The probability of being each class is then calculated by aggregating these two models. Note this is similar to the plain method but with different (less rigid) covariance assumptions.

"wri"

Weight regression imputation: the current status is going to be learned from the next status. Indicator of the next status should be specified using indicator.next. Currently "wri" only support the case where there are only two non-missing strata. In indicator variable, the two nonmissing strata should be coded as 0 and 1, the missing group should be coded as 2.

weights

(numeric) weights of each subject. If not NULL, the estimated probabilities will be reweightsed to ensure sum(probability) of a subject = the subject's weights. If weights is not NULL, quasibinomial model will be used.

multinom.maxit

see parameter maxit in nnet::multinom, default is 100

return.data

(logical) whether to return data with estimated probabilities.

Value

return a list containing the following components:

Note

Different from the original version, iter is no longer a parameter if tie = FALSE is specified, user need to run for loops of sapply outside of this function to get results from multiple seeds. Three elements in the output list - the data element is a data frame that contains input data and estimated probabilities. The stat element contains estimated treatment difference between 2 arms, in each of the strata of interest. The converge element indicates whether the model converged (taking from $converged from glm and $convergency from multinom) if return.data is FALSE, data won't be returned. model = "wri" is not supported in ps_match_strata

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
 
library(dplyr)
 # example 1:  Impute NA as one stratum in experimental arm; default model 
clinical_1 <- clinical %>% mutate( 
  indicator = case_when(
    STRATUM == "strata_1" ~ 0, 
    STRATUM == "strata_2" ~ 1,
    is.na(STRATUM) & ARM == "experimental" ~ 1,
    TRUE ~ -1 
  ),
  ARM  = factor(ARM, levels = c("control","experimental")),
  BNLR = case_when(
    is.na(BNLR) ~ median(BNLR, na.rm = TRUE),
    TRUE ~ BNLR
  )
)
ps_res1 <- ps_match_strata(
  data.in = clinical_1, formula = indicator ~ BECOG + SEX + BNLR,
  indicator.var = "indicator", tte = "OS_MONTH", event = "OS_EVENT", trt = "ARM",
  class.of.int = list("strata_1" = 1, "strata_2" = 0)
 )
 ## Weighted HRs
 ps_res1$stat
 
 # example 2: "doubly weighted control" model
clinical_2 <- clinical %>% mutate( 
  indicator = case_when(
    STRATUM == "strata_1" ~ 0, 
    STRATUM == "strata_2" ~ 1,
    is.na(STRATUM) & ARM == "experimental" ~ 2,
    TRUE ~ -1
  ),
  ARM  = factor(ARM, levels = c("control","experimental")),
  BNLR = case_when(
    is.na(BNLR) ~ median(BNLR, na.rm = TRUE),
    TRUE ~ BNLR
  )
)

ps_res2 <- ps_match_strata(
  data.in = clinical_2, formula = indicator ~ BECOG + SEX + BNLR, model = "dwc",
  indicator.var = "indicator", tte = "OS_MONTH", event = "OS_EVENT", trt = "ARM",
  class.of.int = list("strata_1" = 0, "strata_2" = 1, "missing" = 2)
 ) 
 ps_res2$stat
 ps_res2$converged 
  

PropensitySub documentation built on July 29, 2021, 9:07 a.m.