Introduction

This presentation will show the basic usage of edgeR:

# Load the data
library(ABC2017)
library(ggplot2)
theme_set(theme_minimal())
library(edgeR)
data("zebrafish")

Trimming and normalizing the data.

Just as before, we started by loading, trimming and normalizing the data:

# Trim 
EM_zebra <- subset(zebrafish$Expression, rowSums(zebrafish$Expression >= 5) >= 3)

# Calculate normalization factors
dge_zebra <- DGEList(EM_zebra)
dge_zebra <- calcNormFactors(dge_zebra, method="TMM")

Building the model matrix

We use model.matrix to make a simple model matrix:

mod <- model.matrix(~gallein, data=zebrafish$Design)
mod

Estimating the dispersion

Now we move on to estimating dispersion or BCV:

disp_zebra <- estimateDisp(dge_zebra, design=mod, robust=TRUE)

The 3-step estimation is all done by a single function in the newest version of edgeR: estimateDisp, which replaces the three previous functions of

Plotting the dispersion

We can then plot the estimates:

plotBCV(y=disp_zebra)

Fiting gene-wise GLMs and testing a coefficient.

Now we can fit gene-wise GLMs:

fit_zebra <- glmFit(disp_zebra, design=mod)

Finally, we can test a coefficient in the model for DE:

# Using the name of the coefficient
gallein <- glmLRT(fit_zebra, coef="galleintreated")

# Using the index of the coefficient
gallein <- glmLRT(fit_zebra, coef=2)

Inspecting the results

edgeR has several useful functions for inspecting and plotting the results:

Inspecting the top hits with topTags:

topTags(gallein)

Inspecting the results

Summarize the number of up- and down-regulated genes with decideTestsDGE:

is_de <- decideTestsDGE(gallein, p.value=0.1)
head(is_de)
summary(is_de)

Inspecting the results

MA-plot, showing relation between mean expression and logFC, :

plotSmear(gallein, de.tags=rownames(gallein)[is_de != 0])

Inspecting the results

There is no built-in function for making a volcano plot, but it's easy to do yourself:

plot(-log10(PValue)~logFC, data=topTags(gallein, n=Inf, sort.by = "none"), col=factor(decideTestsDGE(gallein) != 0), pch=16)

Quasi-likelihood: Fitting models

Recently, edgeR introduced an alternative DE pipeline using Quasi-likelihood. It should be slightly more conservative than the standard edgeR pipeline. Code-wise, though, they are almost identical:

fitQL <- glmQLFit(y=disp_zebra, design=mod, robust=TRUE)
galleinQL <- glmQLFTest(fitQL, coef="galleintreated")
summary(decideTestsDGE(galleinQL, p.value=0.1))

Quasi-likehood: Plotting dispersions

The QL-pipeline estimates slightly different dispersions:

plotQLDisp(fitQL)

Final exercises:

Use the above code as a template for conducting your own analysis. For the pasilla dataset:

If you have time:

Once again - the next couple of slides contain cheat sheets...

Cheatsheet: Experimenting with model matrices

# Load the data
data("pasilla")

# This will makes "treated the intercept!"
mod <- model.matrix(~condition, data=pasilla$Design)
mod

Cheatsheet: Experimenting with model matrices

# Relevel to make "untreated" intercept
pasilla$Design$condition <- relevel(pasilla$Design$condition, "untreated")
mod <- model.matrix(~condition, data=pasilla$Design)
mod

Cheatsheet: Experimenting with model matrices

# Relevel to make "untreated" intercept
pasilla$Design$condition <- relevel(pasilla$Design$condition, "untreated")
mod_batch <- model.matrix(~condition+type, data=pasilla$Design)
mod_batch

Cheatsheet: Fitting the models

Fit and test model without batch correction:

# Trim 
EM <- subset(pasilla$Expression, rowSums(pasilla$Expression >= 5) >= 3)

# Calculate normalization factors
dge <- calcNormFactors(DGEList(EM), method="TMM")

# Calculate dispersion
disp <- estimateDisp(dge, design=mod, robust=TRUE)

# Fit models
fit <- glmFit(disp, design=mod)

# Perform the test of the given coefficient
res <- glmLRT(fit, coef="conditiontreated")

Cheatsheet: Fitting the models

Fit and test model without batch correction:

# Calculate dispersion
disp_batch <- estimateDisp(dge, design=mod_batch, robust=TRUE)

# Fit models
fit_batch <- glmFit(disp_batch, design=mod_batch)

# Perform the test of the given coefficient
res_batch <- glmLRT(fit_batch, coef="conditiontreated")

Cheatsheet: Inspecting the results

Plot dispersions

par(mfrow=c(1,2))
plotBCV(disp)
plotBCV(disp_batch)

Cheatsheet: Inspecting the results

Number of DE genes

summary(decideTestsDGE(res))
summary(decideTestsDGE(res_batch))

Cheatsheet: Inspecting the results

Top genes

topTags(res)
topTags(res_batch)

Cheatsheet: Inspecting the results

MA-plots:

par(mfrow=c(1,2))
plotSmear(res_batch, de.tags=rownames(res)[decideTestsDGE(res) != 0])
plotSmear(res_batch, de.tags=rownames(res_batch)[decideTestsDGE(res_batch) != 0])

Cheatsheet: Inspecting the results

Volcano plots:

par(mfrow=c(1,2))
plot(-log10(PValue)~logFC, data=topTags(res, n=Inf, sort.by = "none"), col=factor(decideTestsDGE(res) != 0), pch=16)
plot(-log10(PValue)~logFC, data=topTags(res_batch, n=Inf, sort.by = "none"), col=factor(decideTestsDGE(res_batch) != 0), pch=16)

Cheatsheet: Quasi-likehood

Trying the more conservative QL-pipeline

fitQL_batch <- glmQLFit(y=disp_batch, design=mod_batch, robust=TRUE)
testQL_batch <- glmQLFTest(fitQL_batch, coef=2)
summary(decideTestsDGE(testQL_batch, p.value=0.05))

Cheatsheet: Quasi-likehood

The QL-pipeline estimates slightly different dispersions:

plotQLDisp(fitQL)


MalteThodberg/ABC2017 documentation built on Nov. 18, 2017, 7:51 a.m.