knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, error = TRUE)
i. About OmicsON i. OmicsON workflow * 1. Set up mapping files * 2. Data input * 3. Decorate data by Reactome data * 4. Decorate data by STRING data * 5. Functional Ineractions DF * 7. Multivariate Statistical Analysis * a. CCA - Canonical Correlation Analysis * b. PLS - Partial Least Squares Regression
OmicsON provides knowedge driven data regularisation to facilitate multivariate analysis of 'omics' data. Current release is targetting ChEBI, Reactome and STRING ontologies.
Below you will find OmicsON workflow described step by step. By following these steps and using examplary data included in this package user should get the same results as shown in this vignette.
It is important to set up OmicsON mapping. You can set it up by invoking function presented below. ChEBI to Reactome, Ensembl to Reactome and UniProt to Reactome mappings are required.
OmicsON::setUpReactomeMapping(ChEBI2ReactomeFileURL = "https://reactome.org/download/current/ChEBI2Reactome.txt", Ensembl2ReactomeFileURL = "https://reactome.org/download/current/Ensembl2Reactome.txt", UniProt2ReactomeFileURL = "https://reactome.org/download/current/UniProt2Reactome.txt")
After setting up mapping, to start work with OmicsON you need to provide two 'omics' data sets in data frame form. Data frame can be created from files as in \extdata directory under package insallation directory. This files are in tab delimited files with headers (colnames and rownames). Files are in two sets; minimal and normal. Minimal was created artificialy to fast vignette processing and to help understand basis of OmicsON. Normal is a real data file, use it after minimal example please. Files are named respectively nm-transcriptomics-min.txt
, nm-lipidomics-min.txt
and nm-transcriptomics.txt
, nm-lipidomics.txt
.
Below you can find first few lines of minimal data files presented in form of data frame. As you can see all of them have heades and colnames. To find them localization please run system.file(package="OmicsON")
.
You can use below snippet of code to load mentioned files into R environment:
pathToFileWithLipidomicsData <- system.file(package="OmicsON", "extdata", "nm-lipidomics-min.txt") lipidomicsInputData <- read.table(pathToFileWithLipidomicsData, header = TRUE)
knitr::kable(lipidomicsInputData[1:4, 1:7], caption = "Lipidomisc data")
pathToFileWithTranscriptomicsData <- system.file(package="OmicsON", "extdata", "nm-transcriptomics-min.txt") transcriptomicsInputData <- read.table(pathToFileWithTranscriptomicsData, header = TRUE)
knitr::kable(transcriptomicsInputData[1:10, 1:7], caption = "Transcriptomics data")
As soon as you set up OmincsON and input data are loaded, you are ready to decorate data by data presented in Reactome database. It is done by searching of ontologically related molecules presented in Reactome's pathways. For further processes in that vignette we are using minimal input data. Rows are chosen to present possible border cases in a short time. CHEBI:28661 has representation in Reactome pathways. CHEBI:36036, CHEBI:61205 are not not presented in Reactome mapping, but we can find representat of that group by ontology in ChEBI database - CHEBI:53460, CHEBI:53487. CHEBI:35465 is not interesting case, not presented and no representants in ChEBI ontology, so mapping and decoration is empty for that id.
decoratedByReactome <- OmicsON::decorateByReactomeData( chebiMoleculesDf = lipidomicsInputData, chebiIdsColumnName = "ChEBI", organismTaxonomyId = '9606')
We are not printing decoratedByReactome
here, because of problematic format. You can print it yourself in R console, then it is presented much better.
What algorithm is behind the ontologies mapping? The first two columns of result table represent mapping of all small molecules to respective parents and children of ChEBI ontology: "root" denotes source IDs; if ID already exists in Reactome ontologyId column has the same value, if not but there is an alternative in the form of child or parent in ChEBI ontology, OmicsON put its id under "ontologyId" column. If we can not find root id and can not find any parent or children for it, OmicsON leave it empty.
Full result's data frame contains respectively:
When you have results from Reactome step, then you are ready to use decoration by STRING DB. In this part you search for any extra interactions of gens which you find in Reactome. STRING calls them neighbours. To do it just put results achived from Reactome's decoration step to OmicsON::decorateByStringDbData
method and set listOfEnsembleIdColumnName
attribute to proper value - ensembleIds
or uniProtIds
. This function produce data frame. Below we have presented two data frames respectively for ensembleIds
and uniProtIds
. This step is time consuming, so be patient please.
decoratedByStringBaseOnEnsembleIds <- OmicsON::decorateByStringDbData( chebiIdsToReactomePathways = decoratedByReactome, listOfEnsembleIdColumnName = 'ensembleIds') decoratedByStringBaseOnUniProtIds <- OmicsON::decorateByStringDbData( chebiIdsToReactomePathways = decoratedByReactome, listOfEnsembleIdColumnName = 'uniProtIds')
Data frame returned from this method introduces three new columns. Them are, respectively:
Data frames resulted from decoration by String DB is also not printed because of unusual formatting. You can print it easy by evaluating the vignette yourself, step by step.
Please go to Functional Ineractions DF to know how to convert above decorated structures to easily operative data frames.
If you want to know how to traverse through data in decorated structure? I will present it on example. Moste importan is to understand that many cells include list of vectors! Mapping to Ensemble can give different results than mapping to UniProt, so it is worth to perform both.
as.character(decoratedByStringBaseOnEnsembleIds[4, "root"]) as.character(decoratedByStringBaseOnUniProtIds[4, "root"])
decoratedByStringBaseOnEnsembleIds[4, "ensembleIds"][[1]] decoratedByStringBaseOnUniProtIds[4, "uniProtIds"][[1]]
decoratedByStringBaseOnEnsembleIds[2, "stringGenesSymbolsNarrow"][[1]] decoratedByStringBaseOnUniProtIds[2, "stringGenesSymbolsNarrow"][[1]]
You can create a functional interactions data frame by using this method:
ontology2EnsembleFunctionalInteractions <- OmicsON::createFunctionalInteractionsDataFrame( chebiToReactomeDataFrame = decoratedByReactome, singleIdColumnName = 'ontologyId', idsListColumnName = 'ensembleIds') ontology2UniProtFunctionalInteractions <- OmicsON::createFunctionalInteractionsDataFrame( chebiToReactomeDataFrame = decoratedByReactome, singleIdColumnName = 'ontologyId', idsListColumnName = 'uniProtIds') ontology2GenesSymboleFromEnsembleFunctionalInteractions <- OmicsON::createFunctionalInteractionsDataFrame( chebiToReactomeDataFrame = decoratedByReactome, singleIdColumnName = 'ontologyId', idsListColumnName = 'genesSymbolsFromEnsemble') ontology2GenesSymboleFromUniProtFunctionalInteractions <- OmicsON::createFunctionalInteractionsDataFrame( chebiToReactomeDataFrame = decoratedByReactome, singleIdColumnName = 'ontologyId', idsListColumnName = 'genesSymbolsFromUniProt') ontology2GenesSymboleFromStringExpandFunctionalInteractions <- OmicsON::createFunctionalInteractionsDataFrame( chebiToReactomeDataFrame = decoratedByStringBaseOnUniProtIds, singleIdColumnName = 'ontologyId', idsListColumnName = 'stringGenesSymbolsExpand') ontology2GenesSymboleFromStringNarrowFunctionalInteractions <- OmicsON::createFunctionalInteractionsDataFrame( chebiToReactomeDataFrame = decoratedByStringBaseOnUniProtIds, singleIdColumnName = 'ontologyId', idsListColumnName = 'stringGenesSymbolsNarrow')
knitr::kable(head(ontology2EnsembleFunctionalInteractions, 6)) knitr::kable(head(ontology2UniProtFunctionalInteractions, 6)) knitr::kable(head(ontology2GenesSymboleFromEnsembleFunctionalInteractions, 6)) knitr::kable(head(ontology2GenesSymboleFromUniProtFunctionalInteractions, 6)) knitr::kable(head(ontology2GenesSymboleFromStringExpandFunctionalInteractions, 6)) knitr::kable(head(ontology2GenesSymboleFromStringNarrowFunctionalInteractions, 6))
OmicsON provides two statistical methods to analyse those data:
Calculate CCA on data decorated by String DB:
ccaResultsNarrow1 <- OmicsON::makeCanonicalCorrelationAnalysis( xNamesVector = ontology2GenesSymboleFromStringNarrowFunctionalInteractions$stringGenesSymbolsNarrow, yNamesVector = ontology2GenesSymboleFromStringNarrowFunctionalInteractions$root, XDataFrame = transcriptomicsInputData, YDataFrame = lipidomicsInputData, xCutoff = 1, yCutoff = 1) ccaResultsNarrow2 <- OmicsON::makeCanonicalCorrelationAnalysis( xNamesVector = ontology2GenesSymboleFromStringNarrowFunctionalInteractions$stringGenesSymbolsNarrow, yNamesVector = ontology2GenesSymboleFromStringNarrowFunctionalInteractions$root, XDataFrame = transcriptomicsInputData, YDataFrame = lipidomicsInputData, xCutoff = 0.5, yCutoff = 1)
If you fall into "singular matrix" problem presented bellow;
ccaResultsExpand1 <- OmicsON::makeCanonicalCorrelationAnalysis( xNamesVector = ontology2GenesSymboleFromStringExpandFunctionalInteractions$stringGenesSymbolsExpand, yNamesVector = ontology2GenesSymboleFromStringExpandFunctionalInteractions$root, XDataFrame = transcriptomicsInputData, YDataFrame = lipidomicsInputData)
Think about xCutoff, yCutoff options. You can use it to remove highly correlated variables. Below the xCutoff is set to 0.7 and it makes the problem solvable.
ccaResultsExpand1 <- OmicsON::makeCanonicalCorrelationAnalysis( xNamesVector = ontology2GenesSymboleFromStringExpandFunctionalInteractions$stringGenesSymbolsExpand, yNamesVector = ontology2GenesSymboleFromStringExpandFunctionalInteractions$root, XDataFrame = transcriptomicsInputData, YDataFrame = lipidomicsInputData, xCutoff = 0.7)
Below we presented the list of problems thet could be solved in the same way, by manipulation of xCutoff and yCutoff arguments. It is important to know that sometimes you have to increase cutoff. Best practice is to start from 1 and go down by 0.1 ticks on both. If it not help then try this approach to one cutoff, for example change x with frozen y value.
Ploting CCA results. First plot presents same CCA results on the same data sets, but with different xCutoff
argument, set respectively to 1 and 0.5.
Second plot represents CCA results on Expand data set.
par(mfrow=c(1,2)) OmicsON::plotCanonicalCorrelationAnalysisResults(ccaResults = ccaResultsNarrow1) OmicsON::plotCanonicalCorrelationAnalysisResults(ccaResults = ccaResultsNarrow2) par(mfrow=c(1,1)) OmicsON::plotCanonicalCorrelationAnalysisResults(ccaResults = ccaResultsExpand1)
Calculate PLS on data decorated by String DB:
PLSResults <- OmicsON::makePartialLeastSquaresRegression( xNamesVector = ontology2GenesSymboleFromStringNarrowFunctionalInteractions$stringGenesSymbolsNarrow, yNamesVector = ontology2GenesSymboleFromStringNarrowFunctionalInteractions$root, XDataFrame = transcriptomicsInputData, YDataFrame = lipidomicsInputData, ncompValue = 5)
How to plot results from PLS? Below you can find code which represents PLS results in user friendly form:
OmicsON::plotRmsepForPLS(PLSResults)
OmicsON::plotRegression(PLSResults, ncompValue = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.