knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) has_pkg = requireNamespace("GenomicRanges", quietly = TRUE) && requireNamespace("IRanges", quietly = TRUE) && requireNamespace("readr", quietly = TRUE) knitr::opts_chunk$set(eval = has_pkg)
scPloidy
is a package to compute ploidy of single cells (or nuclei) based on
single-cell (or single-nucleus) ATAC-seq data.
In ATAC-seq, open chromatin regions are excised and sequenced.
For any site on the genome, ATAC-seq could read 0, 1 or 2 DNA fragments,
if the cell was diploid.
If the cell was tetraploid, ATAC-seq could read 0, 1, 2, 3 or 4 fragments from the same site.
This is the basic idea used in scPloidy
.
We model the depth of DNA sequencing at one site by binomial distribution.
library(scPloidy) library(GenomicRanges) library(IRanges) library(readr)
See description.
?fragmentoverlapcount ?ploidy
In this section, we demonstrate the package by using a dataset included in the package. See next section on how to analyze your own data.
We use small dataset for single-nucleus ATAC-seq of rat liver. To limit the file size, it only includes 10 cells and fragments from chromosomes 19 and 20. The fragments were mapped to the rat rn7 genome.
We first set the regions where overlaps are counted. This is usually all of the autosomes.
targetregions = GenomicRanges::GRanges( c("chr19", "chr20"), IRanges::IRanges(c(1, 1), width = 500000000))
Simple repeats in the genome can generate false overlaps. We exclude such regions. The regions were downloaded from USCS genome browser.
simpleRepeat = readr::read_tsv( system.file("extdata", "simpleRepeat.chr19_20.txt.gz", package = "scPloidy"), col_names = c("chrom", "chromStart", "chromEnd")) simpleRepeat[, 2] = simpleRepeat[, 2] + 1 # convert from 0-based position to 1-based simpleRepeat = GenomicRanges::makeGRangesFromDataFrame( as.data.frame(simpleRepeat), seqnames.field = "chrom", start.field = "chromStart", end.field = "chromEnd")
Now compute the overlaps.
The input file SHR_m154211.10cells.chr19_20.fragments.txt.gz
records the fragments sequenced in ATAC-seq.
fragmentoverlap = fragmentoverlapcount( system.file("extdata", "SHR_m154211.10cells.chr19_20.fragments.txt.gz", package = "scPloidy"), targetregions, excluderegions = simpleRepeat, Tn5offset = c(4, -5)) fragmentoverlap rm(fragmentoverlap)
Instead of the small dataset computed above, we here use a complete dataset for single-nucleus ATAC-seq of a rat liver.
data(SHR_m154211) ?SHR_m154211 fragmentoverlap = SHR_m154211$fragmentoverlap fragmentoverlap
Compute ploidy, assuming a mixture of 2x (diploid), 4x (tetraploid) and 8x cells.
We recommend using ploidy.moment
which is based on moment method.
p = ploidy(fragmentoverlap, c(2, 4, 8)) head(p)
The cell type of the cells are stored in dataframe cells
.
There are many hepatocytes of 4x and 8x, but other cell types are mostly 2x.
cells = SHR_m154211$cells table(cells$celltype, p$ploidy.moment[match(cells$barcode, p$barcode)])
In the Cell Ranger output directory, you should have files
fragments.tsv.gz
and fragments.tsv.gz.tbi
.
The file fragments.tsv.gz
can be processed with the fragmentoverlapcount
function,
specifying the option Tn5offset = c(1, 0)
Although this requires several steps, you can start from a BAM file
and generate fragments file for scPloidy
.
You need to install samtools
, bgzip
and tabix
,
and run the following in your shell.
First generate a BAM file in which the cell barcode is prepended to read name,
separated by ':'.
For example, suppose in your input file bap.bam
your barcode BCxxxxxx
was stored in the field with DB
tag as
DB:Z:atac_possorted_bam_BCxxxxxx
.
We generate a BAM file snap.bam
.
```{bash, eval = FALSE}
samtools view bap.bam -H > header.sam
cat <( cat header.sam ) \ <( samtools view bap.bam | awk '{for (i=12; i<=NF; ++i) { if ($i ~ "^DB:Z:atac_possorted_bam_BC"){ td[substr($i,1,2)] = substr($i,25,length($i)-24); } }; printf "%s:%s\n", td["DB"], $0 }' ) \ | samtools view -bS - > snap.bam
We next sort the reads by barcode, and obtain the file `snap.nsrt.bam`. ```{bash, eval = FALSE} samtools sort -n -@ 20 -m 10G snap.bam -o snap.nsrt.bam
Finally, we extract fragments from the name-sorted BAM file,
and obtain the file fragments.txt.gz
.
```{bash, eval = FALSE} samtools view -f 0x2 -F 0x900 -q 30 snap.nsrt.bam \ | samtofragmentbed.pl \ | perl -ne 'chomp; @a=split; $a[3]=~s/:.*//; print join("\t",@a), "\n"' \ | LC_ALL=C sort -T ./ -S 50% --parallel=12 -k1,1 -k2,3n -k4,4 -t$'\t' | uniq \ | bgzip > fragments.txt.gz tabix -b 2 -e 3 fragments.txt.gz
The Perl script `samtofragmentbed.pl` is included in this package as this file: ```r system.file("perl", "samtofragmentbed.pl", package = "scPloidy")
The file fragments.txt.gz
can be processed with the fragmentoverlapcount
function,
specifying the option Tn5offset = c(4, -5)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.