infoSource: Get summary statistics on graphs and variables
In SourceSet: A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways

infoSource

R Documentation

Get summary statistics on graphs and variables

Description

The infoSource function provides a summary of the results by focusing on either variables or graphs.

Usage

infoSource(sourceObj, map.name.variable = NULL, method = "fdr")

Arguments

`sourceObj`	a `SourceSetObj` object, i.e. the output of the `sourceSet` function
`map.name.variable`	a list of customized labels to be associated with the names of the genes. Each list element must contain only one value (i.e. the new label), and the name of each element must be associated with the names of the genes given as input to the `sourceSet` function (column names of `data` input argument). If a label is not mapped, the original name is used
`method`	correction method for p-values calculated on graphs. The adjustment methods allowed are: `fdr` (default), `holm`, `hochberg`, `hommel`, `bonferroni`, `BH`, `BY` or `none`. For more details refer to `p.adjust`.

Value

The function guides the user in identifying interesting variables returning two objects:

graph: a dataframe that summirizes the results of the individual input graphs, composed as follows:
- n.primary: number of genes belonging to the source set;
- n.secondary: number of genes belonging to the secondary set;
- n.graph: number of genes within the graph;
- n.cluster: number of connected components of the graph;
- primary.impact: relative size of the estimated source set. This index quantifies the proportion of the graph impacted by primary dysregulation;
- total.impact: relative size of the set of genes impacted by dysregulation. This index quantifies the proportion of the graph impacted by either primary or secondary dysregulation;
- adj.pvalue: multiplicity adjusted p-value for the hypothesis of equality of the two distributions associated to the given graph
variable: a dataframe that summarized the results of the individual variables, composed as follows:
- n.primary: number of input graphs in which the gene appears in the associated source set;
- n.secondary: number of input graphs in which the gene appears in the associated secondary set;
- n.graph: number of pathways in which the gene is annotated;
- specificity: percentage of input graphs containing the given genes with respect to the total number of input graphs;
- primary.impact: percentage of input graphs, such that the given gene belongs to their estimated source set, with respect to the total number of input graphs in which the gene appears;
- total.impact: percentage of input graphs, such that the given gene is affected by some form of dysregulation in the considered graph, with respect to the total number of input graphs in which the gene appears;
- relevance: percentage of the input graphs such that the given variable belongs to their estimated source set, with respect to the total number of input graphs. It is a general measure of the importance of the gene based on the chosen pathways;
- score: a number ranging from 0 (low significance) to +Inf (maximal significance), computed as the combination of the p-values of all components (of all the input graphs) containing the given variable

Note

Ideally, variables of the primary dysregulation will be elements of the source set in all input graphs that contain them and will thus have high values of source.impact and score. However, if a given variable appears in a single graph, and belongs to its source set, these indices can be deceptive.

For this reason, relevance serves to identify variables that apart from being good candidates for primary genes, also appear frequently in the input graphs. Which index is to be preferred depends on the objective of the analysis: in case of exploratory analysis, we suggest to rely on relevance.

Examples

## Load the SourceSetObj obtained from the source set analysis of ALL dataset

# see vignette for more details
print(load(file=system.file("extdata","ALLsourceresult.RData",package = "SourceSet")))
class(results.all)

info.all<-infoSource(sourceObj = results.all)
## results of individual input graphs
info.all$graph

## results of individual variables
# ..that appear in more than one graph and with relevance>0
info.all.genes<-info.all$variable[info.all$variable$n.graph>1 & info.all$variable$relevance>0,]
# ..ordered by score
ind.ord<-order(info.all.genes$relevance,decreasing = TRUE)
info.all.genes[ind.ord,]

SourceSet documentation built on Nov. 21, 2022, 5:06 p.m.

SourceSet index

SourceSet package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SourceSet
A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways

infoSource: Get summary statistics on graphs and variables
In SourceSet: A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways

Get summary statistics on graphs and variables

Description

Usage

Arguments

Value

Note

Examples

Related to infoSource in SourceSet...

R Package Documentation

Browse R Packages

We want your feedback!

SourceSet A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways

infoSource: Get summary statistics on graphs and variables In SourceSet: A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways

Get summary statistics on graphs and variables

Description

Usage

Arguments

Value

Note

Examples

Related to infoSource in SourceSet...

R Package Documentation

Browse R Packages

We want your feedback!

SourceSet
A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways

infoSource: Get summary statistics on graphs and variables
In SourceSet: A Graphical Model Approach to Identify Primary Genes in Perturbed Biological Pathways