knitr::opts_chunk$set(echo = TRUE)

normal RNA-Seq analysis

Before we start, this part of the package relies on the DESeq2 package. Check out their vignette. Starting material: a raw count table, e.g. the output from featureCounts. The seeqR package provides an example table (sqCounts).

library(seeqR)
counts <- sqCounts
head(counts)

We let the package DESeq2 calculate the differential expression from the raw counts. The seekDEseq function includes all necessary steps. From the colnames of the counts table we can see that there are 3 samples from conditionA and 4 from conditionB and also conditionC. We want to compare conditionC (test) vs. conditionA (control). However, all samples from conditionA are from batch 1 and all samples from conditionC are from batch2, so all the differences could just come from the batch effect. Luckily we have conditionB which has 2 samples from batch 1 & 2 each. This is used automatically to correct for batch effect when we do out comparison. We have to declare conditions and batches in the same order as are the columns in the count table. We provide two conditions we want to compare, cond0 being the control (e.g. WT, untreated, etc.) and cond1 being the test condition (e.g. KO, treated, etc.). The strings have to match the condition strings. Optionally we can also let seekDEseq translate the gene IDs that we have to others. Here we start with EntrezIDs and (id_in) and demand gene symbols (id_out). In order for this to work we have to provide a species (here mouse). Another option would be to set getBiotype=T in order to get an additional output column, which however can take a minute or so to finish the whole process.

conditions <- rep(c("condA","condB","condC"), c(3,4,4))
batches <- c(1,1,1, 1,1,2,2, 2,2,2,2) 
DE <- seekDEseq(counts, conditions, batches, cond0="condA", cond1="condC", id_in="entrezid", id_out="symbol", Species="mouse")
head(DE[[1]])

The result is a list of 3 items: 1) the result table changed and expanded for easy access within seeqR, 2) the original results table from DESeq2, 3) the dds object that is an intermediary step in DESeq2 and used for some analysis options. Let's look at clustering. We follow the DESeq2 advice and check different methods.

seekDEtransformations(DE[["dds"]])

We see that vst works best, so for the cluster vizualisation, we use this instead of the default rlog. Note that this is only important for this vizualisation, not for the differential expression analysis which is already done anyways. The graphs we finally get should give an impression on how the samples cluster together. The upper two graphs are directly from DESeq2, while the lower one is a simple PCA plot.

seekDEcluster(DE[["dds"]], transformation="vst")

A MAplot (Also from DESeq) can be done in the same way. Running the function without providing a comparison will output the possible comparisons. Take the last one and use for the compare option.

seekDEma(DE[["dds"]])
seekDEma(DE[["dds"]], compare="condition_condC_vs_condA")

Lastly we can get a quick overview over the differential regulation via a volcano plot. The seekDEvolcano function has many options to modify the plot, but we can also just use all its default options. The seekDEvolcano function input needs to be a table with certain column names, which are default to the seekDEseq output: symbol, log2FC, padj, log10padj.

seekDEvolcano(DE[[1]])

GSEA analysis

First we generate a ranked list from the differential expression table (or in R terms, a sorted vector). seeqR provides an example table (sqDE) that should actually be exactly the same table that we produces just now. We use it to calculate gene set enrichment analysis via the fgsea package. In this case we choose to have the ranked list consist only of genes that have a significant adjusted p value (in default mode, all genes will go in the list). In the seekGSEA function, "BP" (biological process) is one of the options available to us. The others are "CC" (cellular component) and "MF" (molecular function). This is true as long as we are using "GO" as our genesetList. We can also use custom genesets for genesetList. Custom genesets are just list objects, where each item is a vector of strings, each string being an EntrezID. You can find some on the GSEA website: https://www.gsea-msigdb.org/gsea/msigdb/index.jsp.

de <- DE[[1]]
#OR
de <- sqDE

rl <- seekGSErl(de, pFilter=0.05)
gsea <- seekGSEA(rl, "mouse", genesetList="GO", Ont="BP")
gsea[1:5,1:6]

We can get a quick overview over some top enriched terms via the seekGSEoverview function. By default it shows the top 20 terms, but it will only show those with a significant p value, so in this case it shows less.

seekGSEoverview(gsea)

We can get the typical GSEA plot via the seekGSEplot function. The GSEA plot is very similar to the one in the fgsea package, but the code was adjusted a bit.

seekGSEplot(rl, "GO:0007264", Species="mouse")

If we provide some more info we will get statistics to the graph as well. The set ID (here we chose GO:0007264) can be looked up in the gsea table, column 1.

seekGSEplot(rl, "GO:0007264", Species="mouse", setList="BP", gseaTable=gsea)


Solatar/seeqR documentation built on Feb. 19, 2021, 8:07 p.m.