fragmentoverlapcount: Count Overlap of ATAC-seq Fragments

fragmentoverlapcountR Documentation

Count Overlap of ATAC-seq Fragments

Description

Count Overlap of ATAC-seq Fragments

Usage

fragmentoverlapcount(
  file,
  targetregions,
  excluderegions = NULL,
  targetbarcodes = NULL,
  Tn5offset = c(1, 0),
  barcodesuffix = NULL,
  dobptonext = FALSE
)

Arguments

file

Filename of the file for ATAC-seq fragments. The file must be block gzipped (using the bgzip command) and accompanied with the index file (made using the tabix command). The uncompressed file must be a tab delimited file, where each row represents one fragment. The first four columns are chromosome name, start position, end position, and barcode (i.e., name) of the cell including the fragment. The remaining columns are ignored. See vignette for details.

targetregions

GRanges object for the regions where overlaps are counted. Usually all of the autosomes. If there is memory problem, split a chromosome into smaller chunks, for example by 10 Mb. The function loads each element of targetregions sequentially, and smaller elements require less memory.

excluderegions

GRanges object for the regions to be excluded. Simple repeats in the genome should be listed here, because repeats can cause false overlaps. A fragment is discarded if its 5' or 3' end is located in excluderegions. If NULL, fragments are not excluded by this criterion.

targetbarcodes

Character vector for the barcodes of cells to be analyzed, such as those passing quality control. If NULL, all barcodes in the input file are analyzed.

Tn5offset

Numeric vector of length two. The enzyme for ATAC-seq is a homodimer of Tn5. The transposition sites of two Tn5 proteins are 9 bp apart, and the (representative) site of accessibility is in between. If the start and end position of your input file is taken from BAM file, set the paramater to c(4, -5) to adjust the offset. Alternatively, values such as c(0, -9) could generate similar results; what matters the most is the difference between the two numbers. The fragments.tsv.gz file generated by 10x Cell Ranger already adjusts the shift but is recorded as a BED file. In this case, use c(1, 0) (default value). If unsure, set to "guess", in which case the program returns a guess.

barcodesuffix

Add suffix to barcodes per targetregions.

dobptonext

(experimental feature) Whether to compute smoothed distance to the next fragment (irrelevant to BC) as bptonext, which is the inverse of chromatin accessibility, and append as 9th to 14th columns.

Value

A tibble with each row corresponding to a cell. For each cell, its barcode, the total count of the fragments nfrag, and the count distinguished by overlap depth are given.


scPloidy documentation built on May 29, 2024, 10:37 a.m.