FindPeaks: Perform splice-aware peak calling on a BAM file produced from...

View source: R/count_polyA.R

FindPeaksR Documentation

Perform splice-aware peak calling on a BAM file produced from a scRNA-seq experiment

Description

Takes as input a BAM file produced from barcoded scRNA-seq experiment, the reference (GTF) file used during alignment and a BED file of junctions produced by regtools. For each gene in the reference file, the peak calling process first splits the read coverage into 'across junction' and 'no junction' subsets. Within each subset, the site of maximum coverage is identified and a peak called, by fitting a Gaussian to the read coverage, from a 600bp window around this region. After calling a peak, the local read coverage is removed and the next site of maximum coverage is identified. This process runs iteratively until at least one of two stopping criteria are reached. The first criteria is defined as the maximum read coverage a minimum cutoff (min.cov.cutoff) and proportion (min.cov.prop). The second critera is the size of the peak, including a absolute threshold (min.peak.cutoff) and a relative threshold (min.peak.prop).

Usage

FindPeaks(
  output.file,
  gtf.file,
  bamfile,
  junctions.file,
  min.jcutoff = 50,
  min.jcutoff.prop = 0.05,
  min.cov.cutoff = 500,
  min.cov.prop = 0.05,
  min.peak.cutoff = 200,
  min.peak.prop = 0.05,
  ncores = 1,
  chr.names = NULL,
  filter.chr = FALSE,
  gene.symbol.ref = "gene_name",
  fit.method = "NLS"
)

Arguments

output.file

a file containing polyA sites

gtf.file

reference (GTF) file

bamfile

scRNA-seq BAM file

junctions.file

BED file (as produced by regtools) or SJ.out.tab file (STAR aligner) containing splice junction coordinates

min.jcutoff

minimum number of spliced reads across a junction for it to be considered (default: 50).

min.jcutoff.prop

minimum proportion of junction reads out of all junction reads for that gene (default: 0.05)

min.cov.cutoff

minimum number of reads to consider a peak (default: 500)

min.cov.prop

minimum proportion of reads to consider a peak (default: 0.05)

min.peak.cutoff

minimum peak height (default: 200)

min.peak.prop

minimum ratio of current peak height relative to maximum peak height for this gene (default: 0.05)

ncores

number of cores to use

chr.names

names of chromosomes

filter.chr

names of chromosomes to filter

gene.symbol.ref

field in the GTF file containing the gene symbol

Value

NULL. Writes counts to file.

Examples


extdata_path <- system.file("extdata",package = "Sierra")
reference.file <- paste0(extdata_path,"/Vignette_cellranger_genes_subset.gtf")
junctions.file <- paste0(extdata_path,"/Vignette_example_TIP_sham_junctions.bed")

bamfile <- c(paste0(extdata_path,"/Vignette_example_TIP_sham.bam"),
             paste0(extdata_path,"/Vignette_example_TIP_mi.bam") )
             
peak.output.file <- c("Vignette_example_TIP_sham_peaks.txt",  "Vignette_example_TIP_MI_peaks.txt")


FindPeaks(output.file=peak.output.file[1], gtf.file = reference.file, 
             bamfile=bamfile[1], junctions.file=junctions.file)


VCCRI/Sierra documentation built on July 3, 2023, 6:39 a.m.