knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(vic3PCD)
suppressMessages(library(DESeq2))
suppressMessages(library(dplyr))
suppressMessages(library(RColorBrewer))
suppressMessages(library(ggplot2))
suppressMessages(library(kableExtra))

When we get sequencing data our first question is there a difference in gene expression between Control and Treatment.
Before we dive into specic genes and probable molecular mechanisms we must evaluate the effect of the experimental conditions on gene expression.

In this step we look at expression counts from a sample perspective. Basically, we ask a question how different our samples are?

In this analysis we have 2 types of samples:

In Differential expression analysis we using: - 4 Control samples: P74-3, EP155, DZ-66 and P74-3+DZ-66 - 1 Test sample: P74-3+EP155

Differential expression with DESeq2

1. Load Summarized Experiment (SE) object.

The SE object contains count table and associated metadata.

se <- readRDS(system.file("extdata", "se_vic3_2020.RData", package = "vic3PCD"))

To create your own metadata table and add it to SE object Create a data frame where row names correspond to sample names as in SE object.

tbl <- as.data.frame(colData(se))

kable(tbl, escape = F, linesep = "", booktabs = T, longtable = F, caption = "Summarized Experiment Metadata") %>%
  kable_styling(latex_options = c("scale_down", "repeat_header"), bootstrap_options = c("condensed"), font_size = 12, full_width = F)

2. Process raw data with DESeq2.

Now that we have counts and metadata we proceed with Differential expression analysis.

dds <- DESeqDataSet(se, design = ~ condition)

# Collapse replicas
dds <- collapseReplicates(dds, dds$exset)

# To make sure we have right category used as reference in the analysis
dds$condition <- relevel(dds$condition, ref = "Control")

# DESeq analysis
dds <- DESeq(dds, test = "Wald", sfType = "poscounts", useT = FALSE, minReplicatesForReplace = 7)

# Filter genes with more than 10 aligned reads
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

# Normised counts
rld <- rlog(dds, blind = F)

3. Principal Component Analysis

PCA allows to get an overall view on gene expression between samples. If PC1 and PC2 combined variance is over 50% and clusters are clearly separated we can proceed to following analysis.

# Manual color
clr = c("dodgerblue3", "coral3")
names(clr) <- unique(colData(rld)[,"condition"])

vic3_plotPCA(counts = assay(rld), group_vector = colData(rld)[,"condition"]) +
  scale_colour_manual(values=clr, name = "Condition:") +
  geom_text(label = colData(rld)[,"exset"], size = 3, colour = "black", fontface = "italic", hjust = 0, nudge_x = 3) +
  xlim(-20,60) +
  ylim(-25,25) 

Raw counts distribution

vic3_boxplot(counts = assay(rld))


anabeloff/vic3PCD documentation built on Dec. 2, 2020, 11:03 a.m.