Introduction to scPloidy

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

has_pkg = requireNamespace("GenomicRanges", quietly = TRUE) &&
  requireNamespace("IRanges", quietly = TRUE) &&
  requireNamespace("readr", quietly = TRUE)
knitr::opts_chunk$set(eval = has_pkg)

Introduction

scPloidy is a package to compute ploidy of single cells (or nuclei) based on single-cell (or single-nucleus) ATAC-seq data. In ATAC-seq, open chromatin regions are excised and sequenced. For any site on the genome, ATAC-seq could read 0, 1 or 2 DNA fragments, if the cell was diploid. If the cell was tetraploid, ATAC-seq could read 0, 1, 2, 3 or 4 fragments from the same site. This is the basic idea used in scPloidy. We model the depth of DNA sequencing at one site by binomial distribution.

Usage

library(scPloidy)
library(GenomicRanges)
library(IRanges)
library(readr)

See description.

?fragmentoverlapcount
?ploidy

Analyzing sample data

In this section, we demonstrate the package by using a dataset included in the package. See next section on how to analyze your own data.

Compute overlaps of DNA fragments

We use small dataset for single-nucleus ATAC-seq of rat liver. To limit the file size, it only includes 10 cells and fragments from chromosomes 19 and 20. The fragments were mapped to the rat rn7 genome.

We first set the regions where overlaps are counted. This is usually all of the autosomes.

targetregions =
  GenomicRanges::GRanges(
    c("chr19", "chr20"),
    IRanges::IRanges(c(1, 1), width = 500000000))

Simple repeats in the genome can generate false overlaps. We exclude such regions. The regions were downloaded from USCS genome browser.

simpleRepeat = readr::read_tsv(
  system.file("extdata", "simpleRepeat.chr19_20.txt.gz", package = "scPloidy"),
  col_names = c("chrom", "chromStart", "chromEnd"))
simpleRepeat[, 2] = simpleRepeat[, 2] + 1 # convert from 0-based position to 1-based
simpleRepeat = GenomicRanges::makeGRangesFromDataFrame(
  as.data.frame(simpleRepeat),
  seqnames.field = "chrom",
  start.field    = "chromStart",
  end.field      = "chromEnd")

Now compute the overlaps. The input file SHR_m154211.10cells.chr19_20.fragments.txt.gz records the fragments sequenced in ATAC-seq.

fragmentoverlap =
  fragmentoverlapcount(
    system.file("extdata", "SHR_m154211.10cells.chr19_20.fragments.txt.gz", package = "scPloidy"),
    targetregions,
    excluderegions = simpleRepeat,
    Tn5offset = c(4, -5))
fragmentoverlap
rm(fragmentoverlap)

Infer ploidy

Instead of the small dataset computed above, we here use a complete dataset for single-nucleus ATAC-seq of a rat liver.

data(SHR_m154211)
?SHR_m154211
fragmentoverlap = SHR_m154211$fragmentoverlap
fragmentoverlap

Compute ploidy, assuming a mixture of 2x (diploid), 4x (tetraploid) and 8x cells. We recommend using ploidy.moment which is based on moment method.

p = ploidy(fragmentoverlap,
           c(2, 4, 8))
head(p)

The cell type of the cells are stored in dataframe cells. There are many hepatocytes of 4x and 8x, but other cell types are mostly 2x.

cells = SHR_m154211$cells
table(cells$celltype,
      p$ploidy.moment[match(cells$barcode, p$barcode)])

Analyzing your own data

Using fragments.tsv.gz generated by 10x Cell Ranger

In the Cell Ranger output directory, you should have files fragments.tsv.gz and fragments.tsv.gz.tbi. The file fragments.tsv.gz can be processed with the fragmentoverlapcount function, specifying the option Tn5offset = c(1, 0)

Using BAM file

Although this requires several steps, you can start from a BAM file and generate fragments file for scPloidy. You need to install samtools, bgzip and tabix, and run the following in your shell.

First generate a BAM file in which the cell barcode is prepended to read name, separated by ':'. For example, suppose in your input file bap.bam your barcode BCxxxxxx was stored in the field with DB tag as DB:Z:atac_possorted_bam_BCxxxxxx. We generate a BAM file snap.bam.

```{bash, eval = FALSE}

extract the header file

samtools view bap.bam -H > header.sam

create a bam file with the barcode embedded into the read name

cat <( cat header.sam ) \ <( samtools view bap.bam | awk '{for (i=12; i<=NF; ++i) { if ($i ~ "^DB:Z:atac_possorted_bam_BC"){ td[substr($i,1,2)] = substr($i,25,length($i)-24); } }; printf "%s:%s\n", td["DB"], $0 }' ) \ | samtools view -bS - > snap.bam

We next sort the reads by barcode,
and obtain the file `snap.nsrt.bam`.

```{bash, eval = FALSE}
samtools sort -n -@ 20 -m 10G snap.bam -o snap.nsrt.bam

Finally, we extract fragments from the name-sorted BAM file, and obtain the file fragments.txt.gz.

```{bash, eval = FALSE} samtools view -f 0x2 -F 0x900 -q 30 snap.nsrt.bam \ | samtofragmentbed.pl \ | perl -ne 'chomp; @a=split; $a[3]=~s/:.*//; print join("\t",@a), "\n"' \ | LC_ALL=C sort -T ./ -S 50% --parallel=12 -k1,1 -k2,3n -k4,4 -t$'\t' | uniq \ | bgzip > fragments.txt.gz tabix -b 2 -e 3 fragments.txt.gz

The Perl script `samtofragmentbed.pl` is included in this package
as this file:

```r
system.file("perl", "samtofragmentbed.pl", package = "scPloidy")

The file fragments.txt.gz can be processed with the fragmentoverlapcount function, specifying the option Tn5offset = c(4, -5)



Try the scPloidy package in your browser

Any scripts or data that you put into this service are public.

scPloidy documentation built on May 29, 2024, 10:37 a.m.