All ligand-receptor interaction prediction methods in LIANA require a resource of literature and database knowledge about intercellular signaling. LIANA, together with OmniPath web service, provide a number functionalities to obtain this prior knowledge.
Here we show how to customize and quality filter the prior knowledge from OmniPath, and how to use LIANA with a custom database resource.
OmniPath combines data from more than 20 resources to depict the roles of proteins in intercellular communication, and it uses around 50 network resources to connect these proteins by signaling interactions. In its intercellular communication annotation (intercell) database, OmniPath aims to cover the broadest range of information, and contains numerous false positive records. Hence, for using in prediction methods, as we do in LIANA, quality filtering OmniPath is highly recommended. We do quality filtering based on 1) the amount of evidences (number of resources and references), 2) the consensus between resources for each annotation record, 3) the localization of proteins (e.g. secreted, plasma membrane). The design of the OmniPath database is described in Turei et al. Mol Syst Biol (2021)17:e9923, while you can read more about the quality filtering options here, here and here.
We further provide a list of curated interactions, which we define as those which come from manually-curated resources in the context of CCC, and come with a corresponding Pubmed ID, available via OmnipathR::curated_ligand_receptor_interactions.
Some resources in OmniPath, such as CellChatDB, CellPhoneDB and ICELLNET, provide information about protein complexes in intercellular communication. Besides these, most of the protein complex annotations in OmniPath are in silico inferred, based on the members of the complex. Network interactions of protein complexes are very sparse, as only few resources provide this kind of data.
OmniPath data is distributed by the web service at
OmniPath web service, and the
OmnipathR
R/Bioconductor package
offer a direct access in R. In OmnipathR
, the
import_intercell_network
and
filter_intercell_network
functions represent the interface to the intercellular communication
network. In liana
, the
generate_omni
function wraps the Omnipath
functions and ensures the output is suitable
for the downstream methods in LIANA.
get_curated_omni
function
which is used to generate the curated consensus (default) resource that LIANA uses.
We suggest using the latest OmnipathR version for any resource-generation purposes.
remotes::install_github('saezlab/OmnipathR')
library(tidyverse) library(OmnipathR) library(liana) library(purrr) library(magrittr) liana_path <- system.file(package = "liana") testdata <- readRDS(file.path(liana_path , "testdata", "input", "testdata.rds"))
By default, LIANA uses the Consensus
resource, which is composed solely from the
consensus interactions set coming from the manually-curated (in the context of CCC) "CellPhoneDB",
"CellChatDB", "ICELLNET", "connectomeDB2020", and "CellTalkDB" resources.
These interactions are also filtered by localisation and curation.
For more information about how we generate this resource please refer to the get_curated_omni
function.
consensus_omni <- select_resource("Consensus")[[1]] %>% glimpse
One could also obtain a list with all manually-curated interactions with corresponding PubMed IDs in the context of CCC from OmnipathR. We will use this list to double-check the curation effort of our Consensus resource.
# Obtain purely curated resources curated <- OmnipathR::curated_ligand_receptor_interactions( cellphonedb = TRUE, cellinker = TRUE, talklr = TRUE, signalink = TRUE) %>% liana:::decomplexify() %>% select(source_genesymbol, target_genesymbol) %>% distinct() %>% mutate(curated_flag = 1) consensus_omni %>% liana:::decomplexify() %>% # dissociates complexes select(source_genesymbol, target_genesymbol) %>% distinct() %>% left_join(curated, by = c("source_genesymbol", "target_genesymbol")) %>% distinct() %>% mutate(curated_flag = as.factor(replace_na(curated_flag, 0))) %>% mutate(total = n()) %>% group_by(curated_flag) %>% mutate(num = n()) %>% ungroup() %>% select(curated_flag, total, num) %>% distinct() %>% rowwise() %>% mutate(perc = num/total * 100)
We see that our Consensus resource is highly (>90%) curated.
It is, nevertheless, worth keeping in mind that this check obtains the information from the same resources, so please consider the percentages with a grain of salt :). Yet this approach provides us a reasonable quality reference, applicable to any custom-made resource.
Note that we plan to continuously update this resource, and we welcome any feedback here.
We could also restrict our query to cell-surface (i.e. membrane bound) interactions, and also alter some further parameters: we set a more stringent threshold for consensus of resources on localization, we include also protein complexes, and we simplify the data frame to keep only the most relevant columns.
pm_omni <- generate_omni( loc_consensus_percentile = 51, # increase localisation consensus threshold consensus_percentile = NULL, # include only PM-bound proteins transmitter_topology = c( 'plasma_membrane_transmembrane', 'plasma_membrane_peripheral' ), receiver_topology = c( 'plasma_membrane_transmembrane', 'plasma_membrane_peripheral' ), min_curation_effort = 1, ligrecextra = FALSE, remove_complexes = FALSE, # keep complexes simplify = TRUE # do simplify ) # check categories of ligands (category_intercell_source) pm_omni$category_intercell_source %>% unique() # and receptors (category_intercell_target) pm_omni$category_intercell_target %>% unique() # remove some categories pm_omni %<>% filter( !category_intercell_source %in% c( 'activating_cofactor', 'ligand_regulator', 'inhibitory_cofactor' ) ) # an overview of the resulted data frame: pm_omni %>% glimpse()
In the first example, we filter OmniPath as described in the LIANA biorxiv: Only interactions with literature curation Interactions only where the receiver protein is plasma membrane transmembrane or peripheral, according to at least 30% of the localisation annotations * Only interactions between single proteins (interactions between complexes were included in that version of OmniPath).
# generate_omni returns a tibble with CCC OmniPath cust_omni <- generate_omni( loc_consensus_percentile = 30, consensus_percentile = NULL, transmitter_topology = c( 'secreted', 'plasma_membrane_transmembrane', 'plasma_membrane_peripheral' ), receiver_topology = c( 'plasma_membrane_transmembrane', 'plasma_membrane_peripheral' ), min_curation_effort = 1, ligrecextra = FALSE, remove_complexes = TRUE, simplify = FALSE )
Note that this resource was subsequently improved by manually excluding certain duplicated or ambiguous interactions, and later it was finally deprecated all together. LIANA currently uses the curated resource presented above.
select_resource("OmniPath")
# Run liana with the custom resource # liana Wrap liana_test <- liana_wrap( testdata, resource='custom', external_resource = pm_omni ) %>% liana_aggregate() liana_test %>% glimpse
In reality, generate_omni
is just a wrapper function that calls the
appropriate OmnipathR
functions, provide a convenient interface and
ensures a liana-appropriate format. Here we give an insight into this
process.
# reproduce cust_op as from above with OmniPathR alone # import the OmniPathR intercell network component ligrec <- OmnipathR::import_intercell_network() # filter out the complexes ligrec %<>% filter( entity_type_intercell_source != 'complex' & entity_type_intercell_target != 'complex' ) # apply filtering according to curation and localisation ligrec %<>% OmnipathR::filter_intercell_network( loc_consensus_percentile = 30, consensus_percentile = NULL, transmitter_topology = c( 'secreted', 'plasma_membrane_transmembrane', 'plasma_membrane_peripheral' ), receiver_topology = c( 'plasma_membrane_transmembrane', 'plasma_membrane_peripheral' ), min_curation_effort = 1, ligrecextra = FALSE ) # remove duplicate LRs ligrec %<>% distinct_at( .vars = c( 'source_genesymbol', 'target_genesymbol' ), .keep_all = TRUE ) all_equal(ligrec, cust_omni)
To make use of the full potential of OmnipathR
we kindly refer the user
to its documentation.
This tutorial only presents the intercellular component of OmniPath, in the future we will add more details about intracellular signaling and gene regulation.
options(width = 120) sessioninfo::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.