filter_intercell_network: Quality filter an intercell network

filter_intercell_networkR Documentation

Quality filter an intercell network

Description

The intercell database of OmniPath covers a very broad range of possible ways of cell to cell communication, and the pieces of information, such as localization, topology, function and interaction, are combined from many, often independent sources. This unavoidably result some weird and unexpected combinations which are false positives in the context of intercellular communication. intercell_network provides a shortcut (high_confidence) to do basic quality filtering. For custom filtering or experimentation with the parameters we offer this function.

Usage

filter_intercell_network(
  network,
  transmitter_topology = c("secreted", "plasma_membrane_transmembrane",
    "plasma_membrane_peripheral"),
  receiver_topology = "plasma_membrane_transmembrane",
  min_curation_effort = 2,
  min_resources = 1,
  min_references = 0,
  min_provenances = 1,
  consensus_percentile = 50,
  loc_consensus_percentile = 30,
  ligand_receptor = FALSE,
  simplify = FALSE,
  unique_pairs = FALSE,
  omnipath = TRUE,
  ligrecextra = TRUE,
  kinaseextra = FALSE,
  pathwayextra = FALSE,
  ...
)

Arguments

network

An intercell network data frame, as provided by intercell_network, without simplify.

transmitter_topology

Character vector: topologies allowed for the entities in transmitter role. Abbreviations allowed: "sec", "pmtm" and "pmp".

receiver_topology

Same as transmitter_topology for the entities in the receiver role.

min_curation_effort

Numeric: a minimum value of curation effort (resource-reference pairs) for network interactions. Use zero to disable filtering.

min_resources

Numeric: minimum number of resources for interactions. The value 1 means no filtering.

min_references

Numeric: minimum number of references for interactions. Use zero to disable filtering.

min_provenances

Numeric: minimum number of provenances (either resources or references) for interactions. Use zero or one to disable filtering.

consensus_percentile

Numeric: percentile threshold for the consensus score of generic categories in intercell annotations. The consensus score is the number of resources supporting the classification of an entity into a category based on combined information of many resources. Here you can apply a cut-off, keeping only the annotations supported by a higher number of resources than a certain percentile of each category. If NULL no filtering will be performed. The value is either in the 0-1 range, or will be divided by 100 if greater than 1. The percentiles will be calculated against the generic composite categories and then will be applied to their resource specific annotations and specific child categories.

loc_consensus_percentile

Numeric: similar to consensus_percentile for major localizations. For example, with a value of 50, the secreted, plasma membrane transmembrane or peripheral attributes will be TRUE only where at least 50 percent of the resources support these.

ligand_receptor

Logical. If TRUE, only ligand and receptor annotations will be used instead of the more generic transmitter and receiver categories.

simplify

Logical: keep only the most often used columns. This function combines a network data frame with two copies of the intercell annotation data frames, all of them already having quite some columns. With this option we keep only the names of the interacting pair, their intercellular communication roles, and the minimal information of the origin of both the interaction and the annotations.

unique_pairs

Logical: instead of having separate rows for each pair of annotations, drop the annotations and reduce the data frame to unique interacting pairs. See unique_intercell_network for details.

omnipath

Logical: shortcut to include the omnipath dataset in the interactions query.

ligrecextra

Logical: shortcut to include the ligrecextra dataset in the interactions query.

kinaseextra

Logical: shortcut to include the kinaseextra dataset in the interactions query.

pathwayextra

Logical: shortcut to include the pathwayextra dataset in the interactions query.

...

If simplify or unique_pairs is TRUE, additional column names can be passed here to dplyr::select on the final data frame. Otherwise ignored.

Value

An intercell network data frame filtered.

See Also

  • intercell_network

  • unique_intercell_network

  • simplify_intercell_network

  • intercell

  • intercell_categories

  • intercell_generic_categories

  • intercell_summary

Examples

icn <- intercell_network()
icn_f <- filter_intercell_network(
    icn,
    consensus_percentile = 75,
    min_provenances = 3,
    simplify = TRUE
)


saezlab/OmnipathR documentation built on Nov. 10, 2024, 11:02 p.m.