binReads: Convert aligned reads from various file formats into read...

View source: R/binReads.R

binReadsR Documentation

Convert aligned reads from various file formats into read counts in equidistant bins

Description

Convert aligned reads in .bam or .bed(.gz) format into read counts in equidistant windows.

Usage

binReads(file, assembly, ID = basename(file), bamindex = file,
  chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10,
  remove.duplicate.reads = TRUE, max.fragment.width = 1000,
  blacklist = NULL, outputfolder.binned = "binned_data", binsizes = 1e+06,
  stepsizes = NULL, reads.per.bin = NULL, reads.per.step = NULL,
  bins = NULL, variable.width.reference = NULL, save.as.RData = FALSE,
  calc.complexity = TRUE, call = match.call(), reads.store = FALSE,
  outputfolder.reads = "data", reads.return = FALSE,
  reads.overwrite = FALSE, reads.only = FALSE, use.bamsignals = FALSE)

Arguments

file

A file with aligned reads. Alternatively a GRanges-class with aligned reads.

assembly

Please see getChromInfoFromUCSC for available assemblies. Only necessary when importing BED files. BAM files are handled automatically. Alternatively a data.frame with columns 'chromosome' and 'length'.

ID

An identifier that will be used to identify the file throughout the workflow and in plotting.

bamindex

BAM index file. Can be specified without the .bai ending. If the index file does not exist it will be created and a warning is issued.

chromosomes

If only a subset of the chromosomes should be binned, specify them here.

pairedEndReads

Set to TRUE if you have paired-end reads in your BAM files (not implemented for BED files).

min.mapq

Minimum mapping quality when importing from BAM files. Set min.mapq=NA to keep all reads.

remove.duplicate.reads

A logical indicating whether or not duplicate reads should be removed.

max.fragment.width

Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads.

blacklist

A GRanges-class or a bed(.gz) file with blacklisted regions. Reads falling into those regions will be discarded.

outputfolder.binned

Folder to which the binned data will be saved. If the specified folder does not exist, it will be created.

binsizes

An integer vector with bin sizes. If more than one value is given, output files will be produced for each bin size.

stepsizes

A vector of step sizes the same length as binsizes. Only used for method="HMM".

reads.per.bin

Approximate number of desired reads per bin. The bin size will be selected accordingly. Output files are produced for each value.

reads.per.step

Approximate number of desired reads per step.

bins

A named list with GRanges-class containing precalculated bins produced by fixedWidthBins or variableWidthBins. Names must correspond to the binsize.

variable.width.reference

A BAM file that is used as reference to produce variable width bins. See variableWidthBins for details.

save.as.RData

If set to FALSE, no output file will be written. Instead, a GenomicRanges object containing the binned data will be returned. Only the first binsize will be processed in this case.

calc.complexity

A logical indicating whether or not to estimate library complexity.

call

The match.call() of the parent function.

reads.store

If TRUE processed read fragments will be saved to file. Reads are processed according to min.mapq and remove.duplicate.reads. Paired end reads are coerced to single end fragments. Will be ignored if use.bamsignals=TRUE.

outputfolder.reads

Folder to which the read fragments will be saved. If the specified folder does not exist, it will be created.

reads.return

If TRUE no binning is done and instead, read fragments from the input file are returned in GRanges-class format.

reads.overwrite

Whether or not an existing file with read fragments should be overwritten.

reads.only

If TRUE only read fragments are stored and/or returned and no binning is done.

use.bamsignals

If TRUE the bamsignals package will be used for binning. This gives a tremendous performance increase for the binning step. reads.store and calc.complexity will be set to FALSE in this case.

Details

Convert aligned reads from .bam or .bed(.gz) files into read counts in equidistant windows (bins). This function uses GenomicRanges::countOverlaps to calculate the read counts.

Value

The function produces a list() of GRanges-class or GRangesList objects with meta data columns 'counts', 'mcounts', 'pcounts' that contain the total, minus and plus read count. This binned data will be either written to file (save.as.RData=FALSE) or given as return value (save.as.RData=FALSE).

See Also

binning

Examples

## Get an example BED file with single-cell-sequencing reads
bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData")
## Bin the BED file into bin size 1Mb
binned <- binReads(bedfile, assembly='mm10', binsize=1e6,
                  chromosomes=c(1:19,'X','Y'))
print(binned)


ataudt/aneufinder documentation built on April 18, 2023, 4:20 a.m.