In tanaylab/repsc: Single-cell expression analysis of transposable elements

knitr::opts_chunk$set(eval = FALSE)
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

Workflow mouse 10x scRNA-seq dataset (5')

In this tutorial, we are going to utilize 5' scRNA-seq data on mouse embryonic stem cells (mESCs) and day 2 embryod bodies (EBs). Following the workflow, you'll learn the specifics of Repsc to adapt it to your single-cell dataset.

Getting started

We start the workflow by loading Repsc and the human hg38 BSgenome object into our R environment:

Sys.time()
library(Repsc)
library(BSgenome.Mmusculus.UCSC.mm10)

# Repdata contains gene and TE annotation files, you can download and define those files manually (see below)
devtools::load_all('/net/mraid14/export/tgdata/users/davidbr/src/Repdata/')

Deduplicate Reads (parallel)

We remove duplicated reads by creating artificial contigs along the chromosomes followed by deduplication with UMI tools.

# path to BAM/SAM files containing mapped reads                         
bam_paths <-  dir("~/tgdata/data/eseb/10x/data/combined/aligned", full.names = TRUE, pattern = 'bam.bam$')

# split BAM by chromosome
for (bam in bam_paths)
{
  Reputils::splitBAM(bam)
}

# deduplicate split BAMs
bam_paths <- dir("~/tgdata/data/eseb/10x/data/combined/aligned", full.names = TRUE, pattern = 'chr[0-9, X, Y, MT]*.bam$')
future::plan(future.batchtools::batchtools_sge(resources = list(queue = "all.q", threads = 3, memory = 25), workers = Inf))
res <- listenv::listenv()
for (bam in bam_paths)
{
  print(bam)
  res[[bam]] %<-% Reputils::deduplicateBAM(bam, paired = TRUE, ncores = 3, align_dist = 1e3) %packages% "data.table"
}
as.list(res)

Create scSet

We then import our gene and TE annotation files as GRanges objects followed by Repsc-specific curation and formatting using the curateGenes and curateTEs functions.

# path to Gencode gtf file (provided)
gene_path <- system.file(package = 'Repdata', 
                        'extdata', 
                        'mm10',
                        'genes',
                        'gencode.vM22.annotation.gtf.gz')

# path to RepeatMasker mm10 repeat annotation (provided)
rmsk_path <- system.file(package = 'Repdata', 
                         'extdata', 
                         'mm10',
                         'tes',
                         'mm10.fa.out.gz')

# creating the scSet
sc <- createScSet(genome   = Mmusculus,
                  protocol = 'fiveprime',
                  tes      = rmsk_path,
                  genes    = gene_path)

Create the input data.frame

Here we define the input data

# path to bam files containing mapped reads                         
bam_paths <-  dir("~/tgdata/data/eseb/10x/data/combined/aligned/", 
                  pattern = 'dedup.bam$', 
                  full.names = TRUE)

hdf5_paths <- dir('/net/mraid14/export/data/users/davidbr/proj/eseb/data/',
                  recursive = TRUE,
                  pattern = 'filtered_gene_bc_matrices_h5.h5',
                  full.names = TRUE)

# create a data.frame specifying import parameters                 
input_df    <- data.frame(paths   = bam_paths,
                          paired  = TRUE,       # use FALSE for single-end libraries
                          mate    = 'first',    # only imports the first mate of properly aligned read pairs, set to NA when using single-end libraries
                          barcode = 'CB',       # 10x barcode included in BAM flag
                          chunk   = Reputils:::chunkFiles(bam_paths, 20),
                          meta    = gsub('/', '', substring(bam_paths, 57, 64)),
                          stringsAsFactors = FALSE)

checkInput(input_df)

sc <- addCounts(sc,
                bams     = input_df,
                bin_size = 25,
                use_gcluster = TRUE)

Call cells

To distinguish real cells from empty droplets, we utilize the emptyDrops function from the DropletUtils package[1].

plotCells(sc)
sc_f <- selectCells(sc, min_size = 1e4, max_mito = 0.05)
plotCells(sc_f)

Mapping

plotMapping(sc_f)

Call peaks

sc_f <- selectPeaks(sc_f)
plotPeaks(sc_f)

Feature selection

sc_f <- selectFeatures(sc_f)
plotFeatures(sc_f)

Export results

export(sc_f, outdir = tempdir())

References

[1]

Session information

sessionInfo()

tanaylab/repsc documentation built on Jan. 9, 2021, 9:39 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tanaylab/repsc
Single-cell expression analysis of transposable elements

In tanaylab/repsc: Single-cell expression analysis of transposable elements

Workflow mouse 10x scRNA-seq dataset (5')

Getting started

Deduplicate Reads (parallel)

Create scSet

Create the input data.frame

Call cells

Mapping

Call peaks

Feature selection

Export results

References

Session information

R Package Documentation

Browse R Packages

We want your feedback!

tanaylab/repsc Single-cell expression analysis of transposable elements

In tanaylab/repsc: Single-cell expression analysis of transposable elements

Workflow mouse 10x scRNA-seq dataset (5')

Getting started

Deduplicate Reads (parallel)

Create scSet

Create the input data.frame

Call cells

Mapping

Call peaks

Feature selection

Export results

References

Session information

R Package Documentation

Browse R Packages

We want your feedback!

tanaylab/repsc
Single-cell expression analysis of transposable elements