# xCrosstalk: Function to identify a pathway crosstalk In hfang-bristol/XGR: Exploring Genomic Relations for Enhanced Interpretation Through Enrichment, Similarity, Network and Annotation Analysis

## Description

xCrosstalkGenes is supposed to identify maximum-scoring pathway crosstalk from an input graph with the node information on the significance (measured as p-values or fdr). It returns an object of class "cPath".

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 xCrosstalk(data, entity = c("Gene", "GR"), significance.threshold = NULL, score.cap = NULL, build.conversion = c(NA, "hg38.to.hg19", "hg18.to.hg19"), crosslink = c("genehancer", "PCHiC_combined", "GTEx_V6p_combined", "nearby"), crosslink.customised = NULL, cdf.function = c("original", "empirical"), scoring.scheme = c("max", "sum", "sequential"), nearby.distance.max = 50000, nearby.decay.kernel = c("rapid", "slow", "linear", "constant"), nearby.decay.exponent = 2, networks = c("KEGG", "KEGG_metabolism", "KEGG_genetic", "KEGG_environmental", "KEGG_cellular", "KEGG_organismal", "KEGG_disease", "REACTOME", "PCommonsDN_Reactome"), seed.genes = T, subnet.significance = 0.01, subnet.size = NULL, ontologies = c("KEGGenvironmental", "KEGG", "KEGGmetabolism", "KEGGgenetic", "KEGGcellular", "KEGGorganismal", "KEGGdisease"), size.range = c(10, 2000), min.overlap = 10, fdr.cutoff = 0.05, crosstalk.top = NULL, glayout = layout_with_kk, verbose = T, RData.location = "http://galahad.well.ox.ac.uk/bigdata") 

## Arguments

 data a named input vector containing the significance level for genes (gene symbols) or genomic regions (GR). For this named vector, the element names are gene symbols or GR (in the format of 'chrN:start-end', where N is either 1-22 or X, start/end is genomic positional number; for example, 'chr1:13-20'), the element values for the significance level (measured as p-value or fdr). Alternatively, it can be a matrix or data frame with two columns: 1st column for gene symbols or GR, 2nd column for the significance level. Also supported is the input with GR only (without the significance level) entity the entity. It can be either "Gene" or "GR" significance.threshold the given significance threshold. By default, it is set to NULL, meaning there is no constraint on the significance level when transforming the significance level into scores. If given, those below this are considered significant and thus scored positively. Instead, those above this are considered insignificant and thus receive no score score.cap the maximum score being capped. By default, it is set to NULL, meaning that no capping is applied build.conversion the conversion from one genome build to another. The conversions supported are "hg38.to.hg19" and "hg18.to.hg19". By default it is NA (no need to do so) crosslink the built-in crosslink info with a score quantifying the link of a GR to a gene. See xGR2xGenes for details crosslink.customised the crosslink info with a score quantifying the link of a GR to a gene. A user-input matrix or data frame with 4 columns: 1st column for genomic regions (formatted as "chr:start-end", genome build 19), 2nd column for Genes, 3rd for crosslink score (crosslinking a genomic region to a gene, such as -log10 significance level), and 4th for contexts (optional; if nor provided, it will be added as 'C'). Alternatively, it can be a file containing these 4 columns. Required, otherwise it will return NULL cdf.function a character specifying how to transform the input crosslink score. It can be one of 'original' (no such transformation), and 'empirical' for looking at empirical Cumulative Distribution Function (cdf; as such it is converted into pvalue-like values [0,1]) scoring.scheme the method used to calculate seed gene scores under a set of GR (also over Contexts if many). It can be one of "sum" for adding up, "max" for the maximum, and "sequential" for the sequential weighting. The sequential weighting is done via: ∑_{i=1}{\frac{R_{i}}{i}}, where R_{i} is the i^{th} rank (in a descreasing order) nearby.distance.max the maximum distance between genes and GR. Only those genes no far way from this distance will be considered as seed genes. This parameter will influence the distance-component weights calculated for nearby GR per gene nearby.decay.kernel a character specifying a decay kernel function. It can be one of 'slow' for slow decay, 'linear' for linear decay, and 'rapid' for rapid decay. If no distance weight is used, please select 'constant' nearby.decay.exponent a numeric specifying a decay exponent. By default, it sets to 2 networks the built-in network. For direct (pathway-merged) interactions sourced from KEGG, it can be 'KEGG' for all, 'KEGG_metabolism' for pathways grouped into 'Metabolism', 'KEGG_genetic' for 'Genetic Information Processing' pathways, 'KEGG_environmental' for 'Environmental Information Processing' pathways, 'KEGG_cellular' for 'Cellular Processes' pathways, 'KEGG_organismal' for 'Organismal Systems' pathways, and 'KEGG_disease' for 'Human Diseases' pathways. 'REACTOME' for protein-protein interactions derived from Reactome pathways. Pathways Commons pathway-merged network from individual sources, that is, "PCommonsDN_Reactome" for those from Reactome seed.genes logical to indicate whether the identified network is restricted to seed genes (ie input genes with the signficant level). By default, it sets to true subnet.significance the given significance threshold. By default, it is set to NULL, meaning there is no constraint on nodes/genes. If given, those nodes/genes with p-values below this are considered significant and thus scored positively. Instead, those p-values above this given significance threshold are considered insigificant and thus scored negatively subnet.size the desired number of nodes constrained to the resulting subnet. It is not nulll, a wide range of significance thresholds will be scanned to find the optimal significance threshold leading to the desired number of nodes in the resulting subnet. Notably, the given significance threshold will be overwritten by this option ontologies the ontologies supported currently. It can be 'AA' for AA-curated pathways, KEGG pathways (including 'KEGG' for all, 'KEGGmetabolism' for 'Metabolism' pathways, 'KEGGgenetic' for 'Genetic Information Processing' pathways, 'KEGGenvironmental' for 'Environmental Information Processing' pathways, 'KEGGcellular' for 'Cellular Processes' pathways, 'KEGGorganismal' for 'Organismal Systems' pathways, and 'KEGGdisease' for 'Human Diseases' pathways), 'REACTOME' for REACTOME pathways or 'REACTOME_x' for its sub-ontologies (where x can be 'CellCellCommunication', 'CellCycle', 'CellularResponsesToExternalStimuli', 'ChromatinOrganization', 'CircadianClock', 'DevelopmentalBiology', 'DigestionAndAbsorption', 'Disease', 'DNARepair', 'DNAReplication', 'ExtracellularMatrixOrganization', 'GeneExpression(Transcription)', 'Hemostasis', 'ImmuneSystem', 'Metabolism', 'MetabolismOfProteins', 'MetabolismOfRNA', 'Mitophagy', 'MuscleContraction', 'NeuronalSystem', 'OrganelleBiogenesisAndMaintenance', 'ProgrammedCellDeath', 'Reproduction', 'SignalTransduction', 'TransportOfSmallMolecules', 'VesicleMediatedTransport') size.range the minimum and maximum size of members of each term in consideration. By default, it sets to a minimum of 10 but no more than 2000 min.overlap the minimum number of overlaps. Only those terms with members that overlap with input data at least min.overlap (3 by default) will be processed fdr.cutoff fdr cutoff used to declare the significant terms. By default, it is set to 0.05 crosstalk.top the number of the top paths will be returned. By default, it is NULL meaning no such restrictions glayout either a function or a numeric matrix configuring how the vertices will be placed on the plot. If layout is a function, this function will be called with the graph as the single parameter to determine the actual coordinates. This function can be one of "layout_nicely" (previously "layout.auto"), "layout_randomly" (previously "layout.random"), "layout_in_circle" (previously "layout.circle"), "layout_on_sphere" (previously "layout.sphere"), "layout_with_fr" (previously "layout.fruchterman.reingold"), "layout_with_kk" (previously "layout.kamada.kawai"), "layout_as_tree" (previously "layout.reingold.tilford"), "layout_with_lgl" (previously "layout.lgl"), "layout_with_graphopt" (previously "layout.graphopt"), "layout_with_sugiyama" (previously "layout.kamada.kawai"), "layout_with_dh" (previously "layout.davidson.harel"), "layout_with_drl" (previously "layout.drl"), "layout_with_gem" (previously "layout.gem"), "layout_with_mds", and "layout_as_bipartite". A full explanation of these layouts can be found in http://igraph.org/r/doc/layout_nicely.html verbose logical to indicate whether the messages will be displayed in the screen. By default, it sets to true for display RData.location the characters to tell the location of built-in RData files. See xRDataLoader for details

## Value

an object of class "cPath", a list with following components:

• ig_paths: an object of class "igraph". It has graph attribute (enrichment, and/or evidence, gp_evidence and membership if entity is 'GR'), ndoe attributes (crosstalk)

• gp_paths: a 'ggplot' object for pathway crosstalk visualisation

• gp_heatmap: a 'ggplot' object for pathway member gene visualisation

• ig_subg: an object of class "igraph".

xDefineNet, xCombineNet, xSubneterGenes, xGR2xNet, xEnricherGenesAdv, xGGnetwork, xHeatmap
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ## Not run: # Load the XGR package and specify the location of built-in data library(XGR) RData.location <- "http://galahad.well.ox.ac.uk/bigdata/" # 1) at the gene level data(Haploid_regulators) ## only PD-L1 regulators and their significance info (FDR) data <- subset(Haploid_regulators, Phenotype=='PDL1')[,c('Gene','FDR')] ## pathway crosstalk cPath <- xCrosstalk(data, entity="Gene", network="KEGG", subnet.significance=0.05, subnet.size=NULL, ontologies="KEGGenvironmental", RData.location=RData.location) cPath ## visualisation pdf("xCrosstalk_Gene.pdf", width=7, height=8) gp_both <- gridExtra::grid.arrange(grobs=list(cPath$gp_paths,cPath$gp_heatmap), layout_matrix=cbind(c(1,1,1,1,2))) dev.off() # 2) at the genomic region (SNP) level data(ImmunoBase) ## all ImmunoBase GWAS SNPs and their significance info (p-values) ls_df <- lapply(ImmunoBase, function(x) as.data.frame(x$variant)) df <- do.call(rbind, ls_df) data <- unique(cbind(GR=paste0(df$seqnames,':',df$start,'-',df$end), Sig=df$Pvalue)) ## pathway crosstalk df_xGenes <- xGR2xGenes(data[as.numeric(data[,2])<5e-8,1], format="chr:start-end", crosslink="PCHiC_combined", scoring=T, RData.location=RData.location) mSeed <- xGR2xGeneScores(data, significance.threshold=5e-8, crosslink="PCHiC_combined", RData.location=RData.location) subg <- xGR2xNet(data, significance.threshold=5e-8, crosslink="PCHiC_combined", network="KEGG", subnet.significance=0.1, RData.location=RData.location) cPath <- xCrosstalk(data, entity="GR", significance.threshold=5e-8, crosslink="PCHiC_combined", networks="KEGG", subnet.significance=0.1, ontologies="KEGGenvironmental", RData.location=RData.location) cPath ## visualisation pdf("xCrosstalk_SNP.pdf", width=7, height=8) gp_both <- gridExtra::grid.arrange(grobs=list(cPath$gp_paths,cPath\$gp_heatmap), layout_matrix=cbind(c(1,1,1,1,2))) dev.off() # 3) at the genomic region (without the significance info) level Age_CpG <- xRDataLoader(RData.customised='Age_CpG', RData.location=RData.location)[-1,1] CgProbes <- xRDataLoader(RData.customised='CgProbes', RData.location=RData.location) ind <- match(Age_CpG, names(CgProbes)) gr_CpG <- CgProbes[ind[!is.na(ind)]] data <- xGRcse(gr_CpG, format='GRanges') ## pathway crosstalk df_xGenes <- xGR2xGenes(data, format="chr:start-end", crosslink="PCHiC_combined", scoring=T, RData.location=RData.location) subg <- xGR2xNet(data, crosslink="PCHiC_combined", network="KEGG", subnet.significance=0.1, RData.location=RData.location) cPath <- xCrosstalk(data, entity="GR", crosslink="PCHiC_combined", networks="KEGG", subnet.significance=0.1, ontologies="KEGGenvironmental", RData.location=RData.location) cPath ## End(Not run)