1. Introduction

Single-cell morphological phenotypes, including shape, size, intensity, and texture of cellular compartments have been shown to change in response to perturbations. Image-based morphological profiling has been used to associate phenotypic changes with alterations in cellular function and to infer molecular mechanisms of action. Recently, the Library of Integrated Network-based Cellular Signatures (LINCS) Project has measured gene expression and performed morphological profiling on 9,515 unique drug/cell-line combinations. This data provide an opportunity to study the interdependence between transcription and cell morphology. Previous methods to investigate cell phenotypes have focused on targeting candidate genes as components of known pathways, RNAi morphological profiling, and cataloging morphological defects; however, these methods do not provide an explicit model to link transcriptomic changes with corresponding alterations in morphology. To address this, we propose a cell morphology enrichment analysis to assess the association between transcriptomic alterations and changes in cell morphology. In our approach we: (1) map a query profile of transcriptomic alterations against the LINCS repository, (2) create query specific gene sets for cell morphological features (CM), (3) identify the top CMs associated with the query, and (4) model a regulatory network associated with an enriched morphological phenotype. For a new transcriptomic query, our approach can be used to predict associated changes in cellular morphology. We demonstrate the utility of our pipeline by applying it to expression changes in human bone osteosarcoma cells. The pipeline is implemented as an R package, called CMEA (acronym of the cell Morphological Enrichment Analysis).

2. Input data

We demonstrate the application of the CMEA package by using the transcriptomic and single cell morphological profiles of human bone osteosarcoma cells (U-2 OS cells) in response to the 9515 drugs or small molecule compounds (1). The raw data include the transcripotimic profiles of 20340 drugs and small compound molecules, and single cell morpohological profiles of 19864 drugs and small compound molecules. These data sets were published in 2014 and are available as supplementary files of (2). We used a processed version of this data, which is available as supplementary of (3). The intersection of transcriptomic and single-cell morphological profiles includes 9515 drugs and small compound molecules, and use as a backend repository of CMEA package. The transcriptomic profile in backend repository is 9515*978 matric, including the fold change of the 978 landmark gene expression in response to the treatment of U-2 OS cells with 9515 drugs and small compound molecules.

library(CMEA)
knitr::kable(head(Transcriptomic_Profile)[1:7])

The single-cell morphological profile in backend repository is 9515*812 matric, including the fold change of 812 cell morphological features in response to the treatment of U-2 OS cells with 9515 drugs and small compund molecules.

library(CMEA)
knitr::kable(head(Cell_Morphology_Profile)[1:3])

The complete data set, including raw and processed data, is available at (https://isarnassiri@bitbucket.org/isarnassiri/cmeadata).

3. Mapping query transcriptomic profile against the reference repository

We use the Mapping function to map the query profile (Q) of fold change of gene expression versus reference repository of transcriptomic data (T) and detect similar profiles (s).

Example of application: identification of similar drugs with the transcription profile of "BRD-K37798499":

library(CMEA)
data(Transcriptomic_Profile)
data(Cell_Morphology_Profile)
data(Query_Transcriptomic_Profile)
mappingResults <- mappingQueryTranscriptomic(Query_Transcriptomic_Profile, Transcriptomic_Profile, Cell_Morphology_Profile)
knitr::kable(head(selectedDrugs))

You can find the results in your current directory.

4. Cell morphology enrichment analysis

For cell morphology enrichment analysis, we use the cellMorphologyEnrichmentAnalysis() function for variable selection and fit a sparse model based on the strong correlation between the transcription and single-cell morphological profiles. We present the results of the cell morphology enrichment analysis as cross-tabulation of landmark genes, and single-cell morphological features including the direction of effects (up or down regulation).

Example of application:

library(CMEA)
data(Transcriptomic_Profile)
data(Cell_Morphology_Profile)
data(Query_Transcriptomic_Profile)
cellMorphologyEnrichmentAnalysis <- cellMorphologyEnrichmentAnalysis(20, Query_Transcriptomic_Profile, Transcriptomic_Profile, Cell_Morphology_Profile)    
knitr::kable(head(cellMorphologyEnrichmentAnalysis[,1:3]))

Definition of parameters: The number of top cell morphological features, ranked based on the Strength Centrality Score (SCS) for enrichment analysis (e.g. 20). This parameter specifies the number of first top cell morphological features which are used for enrichment sets of landmark genes.

You can find the results in your current directory.

5. Association rule mining of transcriptomic and cell morphological profiles

We use an association mining model to detect the associations between the genes and to infer a gene regulatory network from phenotypic experimental data. To infer a gene regulatory network for a cell morphological phenotype, first, we mine association rules between genes based on the cell morphological gene sets (Section 3). Next, we apply the Context Likelihood (CLR) network inference algorithm to identify the interactions between genes based on the transcriptomic profiles (Ts,978). CLR infers the interactions between genes by using a scoring function based on the empirical distribution of mutual information. We select interactions are shared between the association rules and CLR to represent a cell morphological phenotype gene regulatory network (GRN).

Example of application:

geneInteractionNetwrok <- geneInteractionNetwrok(10, 0.1, 0.6, Query_Transcriptomic_Profile, Transcriptomic_Profile, Cell_Morphology_Profile) 
knitr::kable(head(matrixOfInteractions))

Definition of parameters: -The number of first top cell morphological features (e.g. 10): We use the top cell morphological features, ranked based on the Strength Centrality Score (SCS) for enrichment analysis. This parameter specifies the number of first top cell morphological features which are used for enrichment sets of landmark genes. -Support parameter (e.g. 0.1): We use an association mining model to detect the associations between the genes and to infer a gene regulatory network from phenotypic experimental data. In order to select significant interactions between any two genes from the complete digraph of interactions between genes, we use minimum thresholds on support and confidence measures. Support measures the frequency of an indicated gene in the cell morphological gene sets. -Confidence parameter (e.g. 0.6): We use an association mining model to detect the associations between the genes and to infer a gene regulatory network from phenotypic experimental data. In order to select significant interactions between any two genes from the complete digraph of interactions between genes, we use minimum thresholds on support and confidence measures. Confidence shows how often a given association rule between two genes has been found in the dataset.

You can find the results in your current directory.

References:

  1. Lamb, J., Crawford, E.D., Peck, D., Modell, J.W., Blat, I.C., Wrobel, M.J., Lerner, J., Brunet, J.P., Subramanian, A., Ross, K.N. et al. (2006) The connectivity map: Using gene-expression signatures to connect small molecules, genes, and disease. Science, 313, 1929-1935.

  2. Wawer, M.J., Jaramillo, D.E., Dancik, V., Fass, D.M., Haggarty, S.J., Shamji, A.F., Wagner, B.K., Schreiber, S.L. and Clemons, P.A. (2014) Automated Structure-Activity Relationship Mining: Connecting Chemical Structure to Biological Profiles. Journal of Biomolecular Screening, 19,738-748.

  3. Wang, Z., Clark, N.R. and Ma'ayan, A. (2016) Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics, 32, 2338-2345.



isarnassiri/CMEA documentation built on May 18, 2021, 4:42 a.m.