This README is prepared for journal peer review of the ODACH paper. The proposed ODACH algorithm can be used to fit stratified Cox proportional hazards regression using multi-center data and requires each site to communicate only some aggregate data (AD), but achieves better estimates than the meta-analysis estimates.
To demonstrate its usage, we use a lung cancer survival data of 228 patients (survival::lung) and assume the patients are from 3 sites. We analyze the association of survival time with age and sex.
ODACH is a distributed algorithm under PDA (Privacy-preserving Distributed Algorithms) framework. The package in this repo is for the whole PDA framework, including other distributed algorithms.
pda
package, make sure you have cpp compiler as pda
requires Rcpp. This is a technical requirement for pda
and not related to ODACH algorithm.To implement ODACH algorithms, we need to install the pda
package.
In RStudio, create a new project: File -> New Project... -> New Directory -> New Project.
Execute the following R code:
install.packages("devtools")
library(devtools)
devtools::install_github("penncil/pda")
# Or you can install from CRAN:
install.packages("pda")
The installation time is about 20 secs via github and 5 secs via CRAN (normal personal computer).
In the toy example below we aim to analyze the association of lung cancer survival time with age and sex using Cox PH regression, assuming site-specific baseline hazards. We demonstrate using PDA ODACH can obtain more accurate estimates than the meta-analysis estimates (i.e. inverse variance weighted average of individual estimates from each site).
We run the example in local directory. In actual collaboration, account/password for pda
server will be assigned to the sites at the server https://pdamethods.org/. Each site can access via web browser to check the communication of the summary stats. Use demo(ODACH)
for an automatic demo of the example.
Step 0: load related R packages and prepare sample data
## load packages
require(pda)
require(survival)
## data
data(lung2)
## split the data to 3 separate sets (patient-level data)
lung_split <- split(lung2, lung2$site)
Step 1: Initialization
## setup pda control
sites = c('site1', 'site2', 'site3')
control <- list(project_name = 'Lung cancer study',
step = 'initialize',
sites = sites,
heterogeneity = TRUE,
model = 'ODAC',
family = 'cox',
outcome = "Surv(time, status)",
variables = c('age', 'sex'),
optim_maxit = 100,
lead_site = 'site1',
upload_date = as.character(Sys.time()) )
## specify your working directory, default is the current working dir
mydir <- getwd()
pda(site_id = 'site1', control = control, dir = mydir)
# you now can see control.json in the working dir
## site3 communicate its individual estimate: after review, enter "1" to allow tranferring to lead site
pda(site_id = 'site3', ipdata = lung_split[[3]], dir=mydir)
# you now can see site3_initialize.json in the working dir
## site2 communicate its individual estimate: after review, enter "1" to allow tranferring to lead site
pda(site_id = 'site2', ipdata = lung_split[[2]], dir=mydir)
# you now can see site2_initialize.json in the working dir
## site1 communicate its individual estimate: after review, enter "1" to allow tranferring to lead site
pda(site_id = 'site2', ipdata = lung_split[[1]], dir=mydir)
# you now can see site1_initialize.json in the working dir
# all the individual estimates are ready, control.json is also automatically updated to the next step
STEP 2: calculate derivatives (AD)
## site3 communicate its derivatives as AD: after review, enter "1" to allow tranferring AD
pda(site_id = 'site3', ipdata = lung_split[[3]], dir=mydir)
# you now can see site3_derivatives.json in the working dir
## site2 communicate its derivatives as AD: after review, enter "1" to allow tranferring AD
pda(site_id = 'site2', ipdata = lung_split[[2]], dir=mydir)
# you now can see site2_derivatives.json in the working dir
## site1 communicate its derivatives as AD: after review, enter "1" to allow tranferring AD
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=mydir)
# you now can see site1_derivatives.json in the working dir
# all the individual AD are ready, control.json is also automatically updated to the next step
STEP 3: estimation using AD
## lead site obtain surrogate estimates using ADs
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=mydir)
# You now can see site1_estimate.json in the working dir
## the PDA ODACH is now completed!
## All the sites can still run their own surrogate estimates and broadcast them.
## compare the surrogate estimate with the pooled and meta estimates
config <- getCloudConfig(site_id = 'site1', dir=mydir)
fit.pda <- pdaGet(name = 'site1_estimate', config = config)
control <- pdaGet('control', config)
cbind(b.pool=fit.pool$coef,
b.meta =control$beta_init,
b.pda=fit.pda$btilde )
Estimated log hazard ratios:
| | b.pool | b.meta | b.odach | |-------------|---------|---------|---------| | age | 0.0170 | 0.0156 | 0.0170 | | sex |-0.5569 |-0.7876 |-0.5564 |
The running time in each site in this demo is about 9-10 secs.
control
by specifying the model, data, outcome, variables, site names, and local site.pda
function as illustrated above.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.