DUBStepR Vignette
In DUBStepR: Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Setting Up DUBStepR

Installing DUBStepR

DUBStepR requires your R version to be >= 3.5.0. Once you've ensured that, you can install DUBStepR from CRAN using the following command:

if (!require(DUBStepR))
  install.packages("DUBStepR")

The installation should take ~ 30 seconds, not including dependencies.

Loading DUBStepR into your environment

After installation, load DUBStepR using the following command:

library(DUBStepR)

Using DUBStepR

For users new to single-cell RNA sequencing data analysis, we recommend using DUBStepR with the Seurat (Satija, R. et al. Nature Biotechnol. 33.5(2015):495.) package to analyze single-cell RNA seq data. Below is a tutorial using the Seurat package.

Install and load Seurat

Seurat can be installed and loaded into your R environment using the following commands:

library(Seurat)
library(dplyr)

Prepare Seurat object

Here, we use a publicly available PBMC dataset generated by 10X Genomics. Here's a link to the dataset. We use the Feature / cell matrix HDF5 (filtered) file.

Locate the file in your working directory and load the data in to your Seurat object using the hdf5r package in the following manner (For easy loading, we start directly from the Seurat object in this package) :

load("counts.rda")
seuratObj <- CreateSeuratObject(counts = counts, assay = "RNA", project = "10k_PBMC")
seuratObj

For DUBStepR, we recommend log-normalizing your data. That can be performed in Seurat using the following command:

seuratObj <- NormalizeData(object = seuratObj, normalization.method = "LogNormalize")

Run DUBStepR to identify feature genes

DUBStepR can be inserted into the Seurat workflow at this stage, and we recommend that be done in the following manner:

dubstepR.out <- DUBStepR(input.data = seuratObj@assays$RNA@data, min.cells = 0.05*ncol(seuratObj), optimise.features = TRUE, k = 10, num.pcs = 20, error = 0)
seuratObj@assays$RNA@var.features <- dubstepR.out$optimal.feature.genes
seuratObj

This step could take upto 1 minute on a normal desktop computer.

Visualize and cluster cells

Following Seurat's recommendations, we scale the gene expression data and run Principal Component Analysis (PCA). We then visualize the standard deviation of PCs using an elbow plot and select the number of PCs we think is sufficient to explain the variance in the dataset.

seuratObj <- ScaleData(seuratObj, features = rownames(seuratObj))
seuratObj <- RunPCA(seuratObj, features = VariableFeatures(object = seuratObj), npcs = 30)
ElbowPlot(seuratObj, ndims = 30)

We select the first few feature genes selected by DUBStepR to show cell type specific expression, using 10 PCs to compute UMAP coordinates.

seuratObj <- RunUMAP(seuratObj, dims = 1:10, n.components = 2, seed.use = 2019)
FeaturePlot(seuratObj, features = VariableFeatures(object = seuratObj)[1:9], cols = c("lightgrey", "magenta"))

Using known marker genes, we show cell type specific regions of the UMAP

FeaturePlot(seuratObj, features = c("MS4A1", "NKG7", "CD3E", "IL7R", "CD8A", "CD14", "CST3", "FCGR3A", "PPBP"))

We select 10 PCs for clustering, and visualize the cells in a 2D UMAP.

seuratObj <- FindNeighbors(seuratObj, reduction = "pca", dims = 1:10)
seuratObj <- FindClusters(seuratObj)
DimPlot(seuratObj, reduction = "umap", label = TRUE, pt.size = 0.5, repel = T, label.size = 5)

Identifying top 10 marker genes of each cluster

top.10.markers <- FindAllMarkers(object = seuratObj, assay = "RNA", logfc.threshold = 0.5, min.pct = 0.5, only.pos = TRUE) %>% filter(p_val_adj < 0.1) %>% group_by(cluster) %>% top_n(n = 10, wt = avg_log2FC)
DoHeatmap(object = seuratObj, features = unique(top.10.markers$gene), size = 5)

Annotating clusters using gene expression

cell.types <- c("0" = "CD14+ Monocytes", "5" = "Inflammatory CD14+ Monocytes", "1" = "Naive CD4+ T cells", "3" = "Memory CD4+ T cells", "4" = "Naive CD8+ T cells", "2" = "B cells", "6" = "NK cells", "7" = "CD16+ Monocytes", "8" = "Platelets")
seuratObj <- RenameIdents(seuratObj, cell.types)
DimPlot(seuratObj, reduction = "umap", label = TRUE, pt.size = 1, repel = TRUE, label.size = 5) + NoLegend()

Any scripts or data that you put into this service are public.

DUBStepR documentation built on Oct. 5, 2021, 5:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DUBStepR
Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data

DUBStepR Vignette
In DUBStepR: Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data

Setting Up DUBStepR

Installing DUBStepR

Loading DUBStepR into your environment

Using DUBStepR

Install and load Seurat

Prepare Seurat object

Run DUBStepR to identify feature genes

Visualize and cluster cells

Try the DUBStepR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

DUBStepR Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data

DUBStepR Vignette In DUBStepR: Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data

Setting Up DUBStepR

Installing DUBStepR

Loading DUBStepR into your environment

Using DUBStepR

Install and load Seurat

Prepare Seurat object

Run DUBStepR to identify feature genes

Visualize and cluster cells

Try the DUBStepR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

DUBStepR
Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data

DUBStepR Vignette
In DUBStepR: Correlation-Based Feature Selection for Single-Cell RNA Sequencing Data