This workshop will focus on the challenges encountered when applying machine learning techniques in complex, high dimensional biological data. In particular, we will focus on biomarker discovery from pharmacogenomic data, which consists of developing predictors of response of cancer cell lines to chemical compounds based on their genomic features. From a methodological viewpoint, biomarker discovery is strongly linked to variable selection, through methods such as Supervised Learning with sparsity inducing norms (e.g., ElasticNet) or techniques accounting for the complex correlation structure of biological features (e.g., mRMR). Yet, the main focus of this talk will be on sound use of such methods in a pharmacogenomics context, their validation and correct interpretation of the produced results. We will discuss how to assess the quality of both the input and output data. We will illustrate the importance of unified analytical platforms, data and code sharing in bioinformatics and biomedical research, as the data generation process becomes increasingly complex and requires high level of replication to achieve robust results. This is particularly relevant as our portfolio of machine learning techniques is ever enlarging, with its set of hyperparameters that can be tuning in a multitude of ways, increasing the risk of overfitting when developing multivariate predictors of drug response.
Following resources might be useful to read:
Participants expected to have the following required packages installed on their machines to be able to run the commands along with the instructors. PharmacoGx and Biobase from Bioconductor mRMRe, caret, glmnet, randomForest from cran * bhklab/mci and bhklab/PharmacoGx-ML from github
An example for a 45-minute workshop:
| Activity | Time | |---------------------------------------------|------| | Introduction | 10m | | Basic functionalities of PharmacoGx | 15m | | Consistency assessment between datasets | 15m | | Machine learning and biomarker discovery | 20m |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.