knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/" )

R package to bridge between Bioconductor’s S4 complex genomic data container, to mlr, a meta machine learning aggregator package.
Bioc2mlr is designed to convert Bioconductor S4 assay data containers summarizedExperiment, MultiAssayExperiment into generalized machnine learning environment.
Bioconductor's S4 data containers for genomic assays are popular, well established data structures. Their data architecture facilitates the application of common analytical procedures and well established statistical methodologies to large assay data. They are extensible to encompass new emerging technologies and analytical methods. However, the S4 system enforces strict constraints on the data and these constraints raise barriers for interoperability and integration with software and packages outside of Bioconductor's repository.
mlr is a comprehensive package for machine learning. It aggregates hundreds of supervised and unsupervised models and facilitates analytics such as resampling, benchmarking, tuning, and ensemble. The mlrCPO package extends mlr's pre-processing and feature engineering functionality via composable Preprocessing Operators (CPO) 'pipelines'.
Bioc2mlr is a compact utility package designed to bridge between these approaches. It deploys transformations of SummarizedExperiment and MultiAssayExperiment S4 data structures into mlr's expected format. It also implements Bioconductor's popular feature selection (filtering) methods used by limma package and others, as a CPO. The vignettes present comparisons to the MLInterfaces package, which aims to achieve similar goals, and presents workflows for popular publicly available genomic datasets such as curatedTCGAData.
knitr::include_graphics("man/figures/vision.jpg")
Because it support functional / clustered data.
Sure. Work in progress. keep tuned.
caret: Bioc2caret
SuperLearner: Bioc2SuperLearner
tidymodels: Bioc2tidymodels
# Install development version from GitHub devtools::install_github("drorberel/Bioc2mlr") # TBA: Install release version from CRAN # install.packages("Bioc2mlr")
Two Bioconductor assay container are currently implemented: SummarizedExperiment for a single assay (though may have multiple sub-assays slots), and MultiAssayExperiment for multiple assays. Within the machine-learning framework, the two main steps that are adapted are the pre-processing step, followed by the (multivariate) model fitting.
Tools will be demonstrated for each of these 4 combinations.
| S4 assay data container | Pre-processing (TBA)| Model (multivariate) | |-----------------------------|:-------------:|-----------------------:| | SummarizedExperiment (SE) | limmaCPO | Fun_SE_to_taskFunc | | MultiAssayExperiment (MAE) | UnivCPO | Fun_MAE_to_taskFunc |
knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message=FALSE) library(tidyverse) library(magrittr) library(Bioc2mlr) library(mlr) select<-dplyr::select
data(Golub_Merge, package = 'golubEsets') # ExpressionSet smallG<-Golub_Merge[200:259,] smallG library(SummarizedExperiment) smallG_SE<-makeSummarizedExperimentFromExpressionSet(smallG) # functional: task_SE_Functional<-Fun_SE_to_taskFunc(smallG_SE, param.Y.name = 'ALL.AML', param.covariates = NULL, param_positive_y_level = 'ALL', task_return_format = 'functional', task_type = 'classif') ## will work with either 1 or multiple assayS task_SE_Functional task_train<-task_SE_Functional %>% subsetTask(subset = 1:40) task_test <-task_SE_Functional %>% subsetTask(subset = 41:72) classif.lrn = makeLearner("classif.knn") model<-train(classif.lrn, task_train) Predict<-model %>% predict(task_test) Predict %>% calculateConfusionMatrix()
library(MultiAssayExperiment) miniACC # miniACC %>% sampleMap %>% data.frame %>% dplyr::select(primary, assay) %>% table # no replicates within same assay task_Functional_MAE<-Fun_MAE_to_taskFunc(miniACC, param.Y.name = 'vital_status', param.covariates = c('gender','days_to_death'), param_positive_y_level = '1', task_type = 'classif') task_Functional_MAE library(bartMachine) classif_lrn_bartMachine<-makeLearner("classif.bartMachine") model_bartMachine<-train(classif_lrn_bartMachine, task_Functional_MAE) Predict_bartMachine<-model_bartMachine %>% predict(task_Functional_MAE) Predict_bartMachine %>% calculateConfusionMatrix()
CAVD dataspace is an online resouce to access and analyze HIV vaccine experimental assay data. It is annotated, and accessible via either online tool, and R API DataSpaceR.
The CAVDmetaMAE package implement a hypothesis-free approach, to find best candidates of immune biomarkers, that are associated with experimental groups, at each study (separately), and across all studies together (meta-analysis).
Within each study, immune biomarkers will be analyzed by both single assays, and combinations across multiple assays.
https://github.com/drorberel/CAVDmetaMAE
Private repo. Access permission by request.
knitr::include_graphics("man/figures/CAVDmetaMAE_ridge.jpg") knitr::include_graphics("man/figures/pheatmaps_DT.png")
A. NCBI/GEO -> SEs -> MAE -> task (DataPackageR)
B. Biomarker discovery:
B.1 Feature selection:
Fun_lrn_univ_only_makePrep_MaG
Fun_lrn_univ_Clusters_All_makePrep_MaG
B.2 Sensitivity analysis
C. (TBA) UnivCPO, UnivClustCPO (refactoring the above
makePreprocWrapper()
TCGA curatedTCGAData
Microbiome curatedMetagenomicData
Omicade4CPO Omicade4
mixomicsCPO mixomics
Utilize MAE to collapse genes to sets/modules/pathways
Fortified: pheatmaps, ggfortify
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.