infoSource: Get summary statistics on graphs and variables

View source: R/mainFunctions.R

infoSourceR Documentation

Get summary statistics on graphs and variables

Description

The infoSource function provides a summary of the results by focusing on either variables or graphs.

Usage

infoSource(sourceObj, map.name.variable = NULL, method = "fdr")

Arguments

sourceObj

a SourceSetObj object, i.e. the output of the sourceSet function

map.name.variable

a list of customized labels to be associated with the names of the genes. Each list element must contain only one value (i.e. the new label), and the name of each element must be associated with the names of the genes given as input to the sourceSet function (column names of data input argument). If a label is not mapped, the original name is used

method

correction method for p-values calculated on graphs. The adjustment methods allowed are: fdr (default), holm, hochberg, hommel, bonferroni, BH, BY or none. For more details refer to p.adjust.

Value

The function guides the user in identifying interesting variables returning two objects:

  • graph: a dataframe that summirizes the results of the individual input graphs, composed as follows:

    • n.primary: number of genes belonging to the source set;

    • n.secondary: number of genes belonging to the secondary set;

    • n.graph: number of genes within the graph;

    • n.cluster: number of connected components of the graph;

    • primary.impact: relative size of the estimated source set. This index quantifies the proportion of the graph impacted by primary dysregulation;

    • total.impact: relative size of the set of genes impacted by dysregulation. This index quantifies the proportion of the graph impacted by either primary or secondary dysregulation;

    • adj.pvalue: multiplicity adjusted p-value for the hypothesis of equality of the two distributions associated to the given graph

  • variable: a dataframe that summarized the results of the individual variables, composed as follows:

    • n.primary: number of input graphs in which the gene appears in the associated source set;

    • n.secondary: number of input graphs in which the gene appears in the associated secondary set;

    • n.graph: number of pathways in which the gene is annotated;

    • specificity: percentage of input graphs containing the given genes with respect to the total number of input graphs;

    • primary.impact: percentage of input graphs, such that the given gene belongs to their estimated source set, with respect to the total number of input graphs in which the gene appears;

    • total.impact: percentage of input graphs, such that the given gene is affected by some form of dysregulation in the considered graph, with respect to the total number of input graphs in which the gene appears;

    • relevance: percentage of the input graphs such that the given variable belongs to their estimated source set, with respect to the total number of input graphs. It is a general measure of the importance of the gene based on the chosen pathways;

    • score: a number ranging from 0 (low significance) to +Inf (maximal significance), computed as the combination of the p-values of all components (of all the input graphs) containing the given variable

Note

Ideally, variables of the primary dysregulation will be elements of the source set in all input graphs that contain them and will thus have high values of source.impact and score. However, if a given variable appears in a single graph, and belongs to its source set, these indices can be deceptive.

For this reason, relevance serves to identify variables that apart from being good candidates for primary genes, also appear frequently in the input graphs. Which index is to be preferred depends on the objective of the analysis: in case of exploratory analysis, we suggest to rely on relevance.

Examples

## Load the SourceSetObj obtained from the source set analysis of ALL dataset

# see vignette for more details
print(load(file=system.file("extdata","ALLsourceresult.RData",package = "SourceSet")))
class(results.all)

info.all<-infoSource(sourceObj = results.all)
## results of individual input graphs
info.all$graph

## results of individual variables
# ..that appear in more than one graph and with relevance>0
info.all.genes<-info.all$variable[info.all$variable$n.graph>1 & info.all$variable$relevance>0,]
# ..ordered by score
ind.ord<-order(info.all.genes$relevance,decreasing = TRUE)
info.all.genes[ind.ord,]

SourceSet documentation built on Nov. 21, 2022, 5:06 p.m.