knitr::opts_chunk$set(echo = TRUE)

Overview

Here is a link to the text of these exercises.

Question 1

A second 10X Genomics single cell RNA-seq dataset is included in the pbda package. This dataset is derived from a human solid tumor. Get its path using the code below:

tumor_10x <- system.file("extdata", "tumor_gene_bc_matrices", "hg19", package = "pbda")

Using the new data, create a new Seurat object named tumor. Next, extract the list of mitochondrial genes in the matrix, calculate the percentage of reads that are derived from the mitochondrial genome in each cell, and add this to a metadata slot called "percent.mito" in the tumor object.

Now, using the GenePlot() function, plot nUMI vs. nGene and nUMI vs. percent.mito. Based on these plots, filter the data to remove dead cells and likely doublets (using FilterCells()).

Strategy


Interpretation

Question 2

Now that you've filtered your cells, normalize the data (NormalizeData()) then identify variable genes using the code below (the cutoffs have been adjusted for this dataset - don't worry about interpreting the VMR plot):

tumor <- FindVariableGenes(tumor,
                          mean.function = ExpMean,
                          dispersion.function = LogVMR, 
                          x.low.cutoff = 0.025,
                          x.high.cutoff = 3,
                          y.cutoff = 1)

Next, scale the data and run PCA on your tumor object using the variable genes you've just identified.

Finally, use an elbow plot and/or PC heatmaps to select the prinicpal components to use for tSNE and clustering. Justify your choice of PCs (do they capture the full variation present in the data?).

Strategy


Interpretation

Question 3

Using the PCs selected in Question 2, run the tSNE algorithm on tumor and identify clusters using FindClusters(). Plot the resulting tSNE. How many clusters are identified? Does this appear reasonable based on the structure present in the tSNE projection?

Strategy


Interpretation

Question 4

Included in the package is a matrix called tx_rates containing transcription rates for ~7000 genes. Transcription rates were measured every 15 minutes for 3 hours after exposure of dendritic cells to LPS, which activates innate immunity responses. The processed data was obtained from GEO

For this exercise perform the following steps:

1) Exclude genes that have rates of 0 across all samples

2) Log transform the data (add a pseudocount of 1 to avoid Infinite values)

3) Normalize the data to the 0 hour time point (i.e. subtract the "rate at 0" from all of the columns )

4) run kmeans clustering on the log transformed and normalized matrix. Use k = between 5 and 10 clusters.

5) Make a heatmap with rows split based on the clusters identified in step 4. This may take a few minutes to plot, depending on your computers resources. Use the code below to generate a color palette. Discuss any interesting patterns in your clusters in the interpretation.

cols_to_use <- circlize::colorRamp2(c(-4, 0, 4), 
                                    c("blue", "white", "red"))

# pseudocode
Heatmap(mat, col = cols_to_use, ...

Strategy


Interpretation

Bonus super fun but optional exercise:

Using your tidyverse skills plot the average transcription rate for each cluster identified in Q4 using ggplot. Facet by cluster. (Hint gather and tidy the matrix, then join with the cluster assignments)



IDPT7810/practical-data-analysis documentation built on July 8, 2022, 9:32 p.m.