The Figure \@ref(fig:opalOmic) describes the different types of omic association analyses that can be performed using DataSHIELD client functions implemented in the r Githubpkg("isglobal-brge/dsOmicsClient") package. Basically, data (omic and phenotypes/covariates) can be stored in different sites (http, ssh, AWS S3, local, ...) and are managed with Opal through the r Githubpkg("obiba/resourcer") package and their extensions implemented in r Githubpkg("isglobal-brge/dsOmics").

knitr::include_graphics(tools::file_path_as_absolute("../fig/dsOmics_A.jpg"))

Then, dsOmicsClient package allows different types of analyses: "virtually" pooled and federated meta-analysis. Both methods are based on fitting different generalized linear models (GLMs) for each feature when assesing association between omic data and the phenotype/trait/condition of interest. Of course non-disclosive omic data analysis from a single study can also be performed.

The "virtually" pooled approach (Figure \@ref(fig:omicAnal1)) is recommended when the user wants to analyze omic data from different sources and obtain results as if the data were located in a single computer. It should be noticed that this can be very time consuming when analyzing multiple features since it calls repeatedly to a base function in DataSHIELD (ds.glm) and that it cannot be recommended when data are not properly harmonized (e.g. gene expression normalized using different methods, GWAS data having different platforms, ...). Also when it is necesary to remove unwanted variability (for transcriptomic and epigenomica analysis) or control for population stratification (for GWAS analysis), this approach cannot be used since we need to develop methods to compute surrogate variables (to remove unwanted variability) or PCAs (to to address population stratification) in a non-disclosive way.

The federated meta-analysis approach Figure \@ref(fig:omicAnal2) overcomes the limitations raised when performing pooled analyses. First, the computation issue is addressed by using scalable and fast methods to perform data analysis at whole-genome level at each server. The transcriptomic and epigenomic data analyses make use of the widely used r Biocpkg("limma") package that uses ExpressionSet or RangedSummarizedExperiment Bioc infrastructures to deal with omic and phenotypic (e.g covariates). The genomic data are analyzed using r Biocpkg("GWASTools") and r Biocpkg("GENESIS") that are designed to perform quality control (QC) and GWAS using GDS infrastructure.

Next, we describe how both approaches are implemented:

knitr::include_graphics(tools::file_path_as_absolute("../fig/dsOmics_B.jpg"))
knitr::include_graphics(tools::file_path_as_absolute("../fig/dsOmics_C.jpg"))


isglobal-brge/dsOmicsClient documentation built on March 20, 2023, 3:52 p.m.