bagged | R Documentation |
Perform exploratory subgroup analysis using bootstrap bias adjustment as described in Rosenkranz (2016). The function fits a GLM or a Cox model in the data and then performs bootstrap samples to correct for bias.
bagged(resp, trt, subgr, covars = NULL, data, fitfunc = c("lm", "glm", "glm.nb", "survreg", "coxph", "rlm"), event, exposure, level = 0.1, B = 2000, mc.cores = 1, stratified = TRUE, select.by = c("BIC", "AIC"), quietly = FALSE, ...)
resp |
Character giving the name of the response variable. The variable can be
either defined in the global environment or in the data-set |
trt |
Character giving the name of the treatment variable. The variable can
be either defined in the global environment or in the data-set
|
subgr |
Character vector giving the variable names in |
covars |
Formula, specifying additional covariates to be included in the models
(need to be available in |
data |
Data frame containing the variables referenced in |
fitfunc |
Model fitting functions. Currrently one of |
event |
Character giving the name of the event variable. Has to be specified
when using fit functions |
exposure |
Character giving the name of the exposure variable, needed for
negative binomial regression, when using fit functions
|
level |
Significance level for confidence intervals will be calculated for treatment effect estimates. |
B |
A numeric input. The number of bootstrap samples to perform. |
mc.cores |
A numeric input. This argument is passed to the |
stratified |
Should the bootstrap resampling be done stratifying by treatment group? (default: TRUE). |
select.by |
Should the model selection be done using BIC or AIC? (default: BIC). |
quietly |
A logical. By default (quietly = FALSE), |
... |
Other arguments passed to the model fitting function. |
In the generalized linear model case, P generalized linear models are fitted such that
M_p: h(μ_{pi}) = α_p + β_p z_i + (γ_p + δ_p z_i) s_{pi} + ∑_{k = 1}^{K} τ_k x_{ik}
where h is the link function, μ_{pi} = E_p[Y_i] is the expectation of the response Y_i under model M_p and x_{ik} are additional covariates we control for. For survival data, a proportional hazards model can be used:
M_p: λ_{pi}(t)= λ_{p0}(t) \exp≤ft\{β_p z_i + (γ_p + δ_p z_i)s_{pi} + ∑_{k = 1}^{K} τ_k x_{ik} \right\}
The focus of estimation is the difference in the treatment effect between a subgroup and its complement, the treatment by subgroup interaction δ_p.
Consider now B bootstrap samples from the original data. Let (Y_{b1}^*, ..., Y_{bN}^*) be a bootstrap sample from the original data. Let (z_{b1}^*, ..., z_{bN}^*), (s_{b1}^*, ..., s_{bN}^*), and (x_{b1k}^*, ..., x_{bNk}^*) be the corresponding treatment indicators, group indicators, and covariates in the bootstrap samples, respectively. For each p=1, ..., P and b=1, ..., B we fit the model:
h(E_p[Y_{bi}^*]) = α^{*}_{bp} + β^{*}_{bp} z_{bi}^* + (γ^{*}_{bp} + δ^{*}_{bp} z_{bi}^*) s^{*}_{bpi} + ∑_{k = 1}^{K} τ_k x^*_{ik}
An estimator of δ_{p} given that subgroup S_p provided the best fit can be calculated as
\overline{δ}^{*}_p = \frac{∑_{b=1}^B u_{bp} \hat{δ}^{*}_{bp} }{∑_{b=1}^B u_{bp}}
where \hat{δ}^{*}_{bp} is the usual estimator of δ_{bp} and u_{bp} = 1 if the subgroup p provides the best fit for bootstrap sample b and 0 otherwise.
An bias-reduced estimator of δ_{p} can be obtained as:
\check{δ}^{*}_p = 2 \hat{δ}_p - \overline{δ}^{*}_p
A bias-reduced estimator with decreased variability is obtained by replacing the maximum likelihood estimator by the bagging estimator \hat{δ}^{*}_{bp}:
\hat{δ}^{*}_p = \frac{1}{B}∑_{b=1}^B \hat{δ}^{*}_{bp}
so that the bias reduced estimator is
\tilde{δ}^{*}_p = 2 \hat{δ}^*_p - \overline{δ}^{*}_p
An object of class subtee. A list containing a dataframe (model_fits) with the estimates using the original data, and a dataframe (bagged_results) with the bootstrap estimates with their percent of selection. The latter contains the following columns: 'percent_selected': the relative proportion for selection of each subgroup, 'bagg': the (uncorrected) bagged estimate \hat{δ}^{*}_p 'boot_red': the bias reduced bootstrap estimate \check{δ}^{*}_p 'bagg_red': the bias reduced bootstrap estimate with decreased variability by bagging \tilde{δ}^{*}_p and the respective standard deviations of the estimates.
Ballarini, N. Thomas, M., Rosenkranz, K. and Bornkamp, B. (2021) "subtee: An R Package for Subgroup Treatment Effect Estimation in Clinical Trials" Journal of Statistical Software, 99, 14, 1-17, doi: 10.18637/jss.v099.i14
Rosenkranz, G.(2016) "Exploratory subgroup analysis in clinical trials by model selection", Biometrical Journal, 58, 1007-1259. doi: 10.1002/bimj.201500147
glm
, coxph
## Not run: ## toy example calls using the simulated datnorm data-set without ## treatment and subgroup effect, see ?datnorm for details data(datnorm) head(datnorm) ## first need to create candidate subgroups (if not already defined in data-set) ## here generate candidate subgroups manually (need to be numeric 0-1 variables) groups <- data.frame(labvalL.5=as.numeric(datnorm$labvalue < 0.5), regUS=as.numeric(datnorm$region == "US"), hgtL175=as.numeric(datnorm$height < 175)) fitdat <- cbind(datnorm, groups) # bind subgroup variables to main data ## subgroups of interest subgr <- c("labvalL.5", "regUS", "hgtL175") res <- bagged(resp = "y", trt = "treat", subgr = subgr, data = fitdat, covars = ~ x1 + x2, fitfunc = "lm") res ## generate candidate subgroups using the subbuild function ## semi-automatically i.e. some groups specified directly (height and ## smoker), for region and labvalue subbuild generates subgroups (see ## ?subbuild) cand.groups <- subbuild(datnorm, height < 175, smoker == 1, region, labvalue) head(cand.groups) fitdat <- cbind(datnorm, cand.groups) subgr <- colnames(cand.groups) res <- bagged(resp = "y", trt = "treat", subgr = subgr, data = fitdat, covars = ~ x1 + x2, fitfunc = "lm") res ## toy example call for binary data on simulated datbin data-set data(datbin) cand.groups <- subbuild(datbin, height < 175, smoker == 1, region, labvalue) fitdat <- cbind(datbin, cand.groups) subgr <- colnames(cand.groups) res <- bagged(resp = "y", trt = "treat", subgr = subgr, data = fitdat, covars = ~ x1 + x2, fitfunc = "glm", family = binomial(link = "logit")) ## scale of the treatment effect estimate: difference on log-odds scale res ## toy example call for parametric and semi-parametric survival data on ## datsurv data-set data(datsurv) cand.groups <- subbuild(datsurv, height < 175, smoker == 1, region, labvalue) fitdat <- cbind(datsurv, cand.groups) subgr <- colnames(cand.groups) res.survreg <- bagged(resp = "y", trt = "treat", subgr = subgr, data = fitdat, covars = ~ x1 + x2, fitfunc = "survreg", event = "event", dist = "exponential") ## parametric survival model (here exponential distribution) ## scale of treatment effect estimate: log scale (see ?survreg for details) res.survreg # Decreased B for a reduction in computational time res.cox <- bagged(resp = "y", trt = "treat", subgr = subgr, data = fitdat, fitfunc = "coxph", event = "event", B = 20) # B=2000 should be used ## scale of treatment effect estimate: difference in log-hazard rate res.cox ## toy example call overdispersed count data on datcount data-set data(datcount) cand.groups <- subbuild(datcount, height < 175, smoker == 1, region, labvalue) fitdat <- cbind(datcount, cand.groups) subgr <- colnames(cand.groups) # Decreased B for a reduction in computational time res <- bagged(resp = "y", trt = "treat", subgr = subgr, data = fitdat, fitfunc = "glm.nb", exposure = "exposure", B = 20) # B=2000 should be used ## scale of treatment effect estimate: difference on log scale res ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.