# TODO keep in mind to save (as an rda in data) a copy of the count table
startQuest("Differential Expression Analysis - DESeq2")
h1("Sexual dimorphism study Differential Expression")
h2("Doing the differential expression analysis.")
library(RnaSeqTutorial) counts <- readAbundance("~/share/sex/kallisto") samples <- read.csv("~/share/sex/doc/samples.csv") dds <- createDESeqDataSet(counts,samples)
The QA revealed that we have a confounding factor. Luckily enough, the grouping of sex is balanced between sampling years and hence we will be able to block the year effect to investigate the sex effect.
h3("DESeq2 differential expression analyses")
Let's take a look at our DESeqDataSet
object dds
.
# the object dds # the metadata colData(dds) # the design design(dds)
Now, we can run the differential expression analysis. The steps will be:
This sounds like a lot, but it is really just one command!
dds <- DESeq(dds)
And that is it! Now, let us look at the results.
resultsNames(dds)
The model resulted in three comparisons, as we considered the variables to be independent.
We are interested in the sex
effect, so let's define a significance cutoff (
but keep in mind our discussion on the "non"-importance of statistical
significance [@Amrhein2019]).
We set a FDR cuttof at 1% (that significance level is often denoted 'alpha').
As shown by Schurch et al. [@Schurch2016], DESeq2
is capable at controlling its False Positive Rate (so that choosing a 1% FDR
means that for 100 positive test results, one will be a false positive), even
for as few as 3 replicates, provided the log2 fold changes (log2FC) are larger
than 0.5. Hence, we also set a cutoff on the log2FC.
alpha=0.01 log2FC=0.5 res <- results(dds,name = "sex_M_vs_F")
Let's take a look at the results
res
It is a table with 6 columns:
Here, we look at the comparison M
vs. F
, in other words the ratio M/F
,
or on a log scale log(M) - log(F)
. The baseline is hence the expression level
in the female samples.
Let's quickly check how many genes are selected by our criteria.
res[abs(res$log2FoldChange) >= log2FC & ! is.na(res$padj) & res$padj <= alpha,]
Only 2 genes, that is very little and rather unexpected. Let's do some validation, looking at an MA plot.
plotMA(as(res,"DataFrame"), alpha=alpha, log2FC=log2FC)
The MA plot looks as expected, the vast majority of the genes as shown in the density plot (upper panel) and as a gradient (grey: sparse to red-yellow: dense) are not differentially expressed, so the main DE assumption holds. Let's take a look at the level of significance vs. the log2 fold change, using a volcano plot.
volcanoPlot(res,alpha=alpha,log2FC=log2FC)
The volcano plot looks pretty dead, apart from the 2 identified genes. There again most of the genes have no change in expression and these have no statistical significance. There seem to be no artefact from the analysis point of view and the conclusion is that the seem to be no sexual differences between Male and Female trees.
Let's make some plots to look at the expression of these genes.
# Potri.019G047300, grouping by sex dotplot(counts(dds,normalized=TRUE)["Potri.019G047300",], groups=samples$sex,col=c("pink","lightblue"),pch=19, xlab="library size corrected counts") # Potri.019G047300, grouping by date dotplot(counts(dds,normalized=TRUE)["Potri.019G047300",], groups=samples$date,col=c("darkgreen","brown"),pch=19, xlab="library size corrected counts")
We know that Potri.019G047300 is the sex determination locus [@Pakull]. So, it
seems highly likely that 226.1
has been mis-sexed.
# Potri.014G155300, grouping by sex dotplot(counts(dds,normalized=TRUE)["Potri.014G155300",], groups=samples$sex,col=c("pink","lightblue"),pch=19, xlab="library size corrected counts") # Potri.014G155300, grouping by date dotplot(counts(dds,normalized=TRUE)["Potri.014G155300",], groups=samples$date,col=c("darkgreen","brown"),pch=19, xlab="library size corrected counts")
Potri.014G155300 should be considered with caution. Albeit there seem to be a clear sex effect, 229 behaves more like a Female and 226.1 which harvest a Male sex determination locus, also. Moreover, the date do not appear as random as for the sex determination locus.
endQuest() startQuest("Before the practical")
Re-visit the analyses we did above, but this time, take a look at the date
results, i.e. the date_Y2010_vs_Y2008
contrast.
endQuest()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.