knitr::opts_knit$set(root.dir = ".") knitr::opts_chunk$set(collapse = TRUE, warning = TRUE) BiocStyle::markdown() library("BiocStyle") library("GSEAdv")
This vignette assumes that the reader is familiar with the infraestructure build in Bioconductor around gene sets, mainly r Biocpkg("GSEABase")
. It provides examples of how to use the package to find relationships between genes and gene sets.
A GeneSetCollection
is a collection of GeneSet
s which can be totally unrelated or related between them. All the over representative analysis (ORA), such as the hypergeometric test or chi-square test, assume that each gene is independent of other genes. This vignette shows how to check and describe the relationship between the genes and gene sets and between gene sets.
First let's load a GeneSetCollection
, from the r Biocpkg("GSEABase")
package we can load directly the Info
object:
fl <- system.file("extdata", "Broad.xml", package = "GSEABase") gss <- getBroadSets(fl) gss Info
This GeneSetCollection
will be used on this vignette to show the new methods implemented in this package.
With r Biocpkg("GSEABase")
we can see what the collection has but we don't know what is the relationship between genes and gene sets.
A first approach would be to use summary to find more information about that relationship:
summary(gss)
This provide us with some information we already knew (and printed when using gss
), like the number of genes and pathways. However it shows also the biggest pathway, and that all genes are in a single gene set.
As a GeneSetCollection
is a collection of gene sets, we can see how many genes are for each gene sets. If the number of genes in all the gene sets is greater than the number of genes in the GeneSetCollection
then there are some genes in several gene sets breaking the independence between gene sets.
(gpp <- genesPerPathway(gss))
Here genesPerPathways returns the number of genes each pathway have, thus adding them will summarize the total number of genes if they are not in several gene sets. In this case we have the method nGenes
, which return r nGenes(gss)
genes in both pathways. So each gene is only in one pathway.
We can check if each gene is really in one GeneSet
or not by using pathwaysPerGene
:
ppg <- pathwaysPerGene(gss) head(ppg)
As expected all the genes all(ppg == 1)
are in a single gene set.
If genes would be repeated in several pathways we could use the nPathways
method which returns what we knew, that there are r nPathways(gss)
pathways in total.
We can also test if gene sets are independent from each other with independence()
, in this case: r independence(gss)
.
We can also check if there are duplicated genes (genes that are exactly in the same pathways), and duplicated pathways (pathways with the same genes):
duplicatedGenes(gss) duplicatedPathways(gss)
We can also see if some pathways are inside another pathway using nested
method, on the Info
object.
nested(Info)
This would mean that pathway "1430728" is in the pathway "156580" and "156582" and "211859".
We can also measure the information content and the probability to have such a pathways and genes with gene
and pathway
:
pathway(gss) head(gene(Info)) # To avoid a long table
If you want to explore the number of genes shared with other pathways or genes you can use the following functions:
distrGenes(Info) distrPathways(Info)
Which returns a matrix with the amount of pathways shared with other genes and the gens of each pathway shared with others pathways, respectively.
If you need to use the information on a GeneSetCollection in a network you can use the adjacency
function to retrieve which genes are connected to which ones.
The method assumes that all genes in a gene set are connected.
adjacency(Info)
You can also use the nEdges
to r nEdges(Info)
to calculate the connections (omitting the connection between a gene and itself)
Some other new functions are explained on the following vignettes:
r Biocpkg("GSEAdv", "vignette_evaluate.html", "*Evaluation*")
Compare and estimate the relationship between genes and gene sets.
r Biocpkg("GSEAdv", "vignette_simulate.html", "*Simulate*")
Given some approximations of the relationships between genes and gene sets creates a new gene set collection.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.