knitr::opts_chunk$set(echo = TRUE)

Overview

Here is a link to the text of these exercises.

Question 1

The drug_resistant2_counts and drug_resistant2_coldata objects contain a count matrix and metadata for an experiment in which two parental cell lines and their drug resistant derivatives were sequenced. You're interested in determining the changes in gene expression associated with acquired resistance to the drug. Build a DESeqDataSet from these objects and calculate differential gene expression. For now, ignore the contribution of cell line identity (use design = ~drug). Convert the results to a tibble and use dyplr verbs to determine the number of genes that are differentially expressed between the parent and resistant group (padj <= 0.01).

Strategy


Interpretation

Question 2

The small number of differentially expressed genes in question 1 is unexpected in this experiment. Apply a regularized log transform to the DESeqDataSet from question 1 and generate a PCA plot to examine the data. What appears to be wrong with these data? Hint: it will be easier to interpret the PCA plot if the intgroups argument is a character vector containing both variables in colData.

Strategy


Interpretation

Question 3

There are two reasonable ways to deal with the problem identified in question 2. One involves dropping samples from the DESeq object as in the pseudo-code below...

dds <- dds[,-c(?,?)]

...and the other requires only editing the colData...

colData(dds)$drug[?] <- "parent"
colData(dds)$drug[?] <- "resistant"

Use the approach that you think is most appropriate and generate a new, corrected DESeqDataSet. Justify your choice briefly in the interpretation. Calculate differential gene expression and report the new number of genes differentially expressed between the parent and resistant group.

Strategy


Interpretation

Question 4

In addition to the problem corrected in question 3, the PCA plot in question 2 revealed that the majority of variance between groups in this experiment is not due to drug resistance. What is the primary source of variance? Build the DESeqDataSet once more, this time including a term in the design (design = ~? + drug) that will correct for the major source of uninteresting variance in this experiment. As before, calculate differential gene expression and report the new number of genes differentially expressed between the parent and resistant group.

Strategy


Interpretation

Question 5

From the results obtained from Q4, pull out a reasonable gene list.

Write down the top 5 enriched annotations in one of the Gene Ontology term categories (BP, CC, or MF)

Make some interpretation, relating back to the RNA-seq experiment design.

Strategy


Interpretation



IDPT7810/practical-data-analysis documentation built on July 8, 2022, 9:32 p.m.