runCountReads: Aligns the reads from the BAM file to the variable binning...

View source: R/runCountReads.R

runCountReadsR Documentation

Aligns the reads from the BAM file to the variable binning pipeline.

Description

runCountReads performs the variable binning (VarBin) algorithm to a series of BAM files resulting from short-read sequencing.

Usage

runCountReads(
  dir,
  genome = c("hg38", "hg19"),
  resolution = c("220kb", "55kb", "110kb", "195kb", "280kb", "500kb", "1Mb", "2.8Mb"),
  remove_Y = FALSE,
  min_bincount = 10,
  is_paired_end = FALSE,
  BPPARAM = bpparam()
)

Arguments

dir

A path for the directory containing BAM files from short-read sequencing.

genome

Name of the genome assembly. Default: 'hg38'.

resolution

The resolution of the VarBin method. Default: '220kb'.

remove_Y

(default == FALSE) If set to TRUE, removes information from the chrY from the dataset.

min_bincount

A numerical indicating the minimum mean bin counts a cell should have to remain in the dataset.

is_paired_end

A boolean indicating if bam files are from single-read or pair end sequencing.

BPPARAM

A BiocParallelParam specifying how the function should be parallelized.

Details

runCountReads takes as input duplicate marked BAM files from whole genome sequencing and runs the variable binning pipeline algorithm. It is important that BAM files are duplicate marked. Briefly, the genome is split into pre-determined bins. The bin size is controlled by the argument resolution. By using VarBin, for a diploid cell, each bin will receive equal amount of reads, controlling for mappability. A lowess function is applied to perform GC correction across the bins. The argument genome can be set to 'hg38' or 'hg19' to select the scaffolds genome assembly. The scaffolds are GenomicRanges objects Information regarding the alignment of the reads to the bins and from the bam files are stored in the #' colData. min_bincount Indicates the minimum mean bincount a cell must present to be kept in the dataset. Cells with low bincounts generally present bin dropouts due to low read count that will be poorly segmented.

Value

A matrix of bin counts within the scCNA object that can be accessed with bincounts

#' @references Navin, N., Kendall, J., Troge, J. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011). https://doi.org/10.1038/nature09807

Baslan, T., Kendall, J., Ward, B., et al (2015). Optimizing sparse sequencing of single cells for highly multiplex copy number profiling. Genome research, 25(5), 714–724. https://doi.org/10.1101/gr.188060.114

Author(s)

Darlan Conterno Minussi

Examples

## Not run: 
copykit_obj <- runCountReads("/PATH/TO/BAM/FILES")

## End(Not run)


navinlabcode/copykit documentation built on Oct. 16, 2024, 2:55 p.m.