knitr::opts_knit$set(root.dir = ".")
knitr::opts_chunk$set(collapse = TRUE, warning = TRUE)
BiocStyle::markdown()
library("BiocStyle")
library("GSEAdv")

Introduction

This vignette assumes that the reader is familiar with the infraestructure build in Bioconductor around gene sets, mainly r Biocpkg("GSEABase"). It provides examples of how to use the package to find relationships between genes and gene sets.

A GeneSetCollection is a collection of GeneSets which can be totally unrelated or related between them. All the over representative analysis (ORA), such as the hypergeometric test or chi-square test, assume that each gene is independent of other genes. This vignette shows how to check and describe the relationship between the genes and gene sets and between gene sets.

Preparation

First let's load a GeneSetCollection, from the r Biocpkg("GSEABase") package we can load directly the Info object:

fl <- system.file("extdata", "Broad.xml", package = "GSEABase")
gss <- getBroadSets(fl)
gss
Info

This GeneSetCollection will be used on this vignette to show the new methods implemented in this package.

With r Biocpkg("GSEABase") we can see what the collection has but we don't know what is the relationship between genes and gene sets.

Summary

A first approach would be to use summary to find more information about that relationship:

summary(gss)

This provide us with some information we already knew (and printed when using gss), like the number of genes and pathways. However it shows also the biggest pathway, and that all genes are in a single gene set.

Genes per pathway

As a GeneSetCollection is a collection of gene sets, we can see how many genes are for each gene sets. If the number of genes in all the gene sets is greater than the number of genes in the GeneSetCollection then there are some genes in several gene sets breaking the independence between gene sets.

(gpp <- genesPerPathway(gss))

Number of genes

Here genesPerPathways returns the number of genes each pathway have, thus adding them will summarize the total number of genes if they are not in several gene sets. In this case we have the method nGenes, which return r nGenes(gss) genes in both pathways. So each gene is only in one pathway.

Pathways per gene

We can check if each gene is really in one GeneSet or not by using pathwaysPerGene:

ppg <- pathwaysPerGene(gss)
head(ppg)

As expected all the genes all(ppg == 1) are in a single gene set.

Number of pathways

If genes would be repeated in several pathways we could use the nPathways method which returns what we knew, that there are r nPathways(gss) pathways in total.

Checking independence between gene sets

We can also test if gene sets are independent from each other with independence(), in this case: r independence(gss).

Duplicated genes or pathways

We can also check if there are duplicated genes (genes that are exactly in the same pathways), and duplicated pathways (pathways with the same genes):

duplicatedGenes(gss)
duplicatedPathways(gss)

Nested pathways

We can also see if some pathways are inside another pathway using nested method, on the Info object.

nested(Info)

This would mean that pathway "1430728" is in the pathway "156580" and "156582" and "211859".

More information about the content of the GeneSetCollection

We can also measure the information content and the probability to have such a pathways and genes with gene and pathway:

pathway(gss)
head(gene(Info)) # To avoid a long table

Shared

If you want to explore the number of genes shared with other pathways or genes you can use the following functions:

distrGenes(Info)
distrPathways(Info)

Which returns a matrix with the amount of pathways shared with other genes and the gens of each pathway shared with others pathways, respectively.

Networks

If you need to use the information on a GeneSetCollection in a network you can use the adjacency function to retrieve which genes are connected to which ones. The method assumes that all genes in a gene set are connected.

adjacency(Info)

You can also use the nEdges to r nEdges(Info) to calculate the connections (omitting the connection between a gene and itself)

Other vignettes

Some other new functions are explained on the following vignettes:

Compare and estimate the relationship between genes and gene sets.

Given some approximations of the relationships between genes and gene sets creates a new gene set collection.

Session info {.unnumbered}

sessionInfo()


llrs/GSEAdv documentation built on May 29, 2019, 6 p.m.