Convert aligned reads from various file formats into read counts in equidistant bins

Share:

Description

Convert aligned reads in .bam or .bed(.gz) format into read counts in equidistant windows.

Usage

1
2
3
4
5
6
7
8
binReads(file, assembly, ID = basename(file), bamindex = file,
  chromosomes = NULL, pairedEndReads = FALSE, min.mapq = 10,
  remove.duplicate.reads = TRUE, max.fragment.width = 1000,
  blacklist = NULL, outputfolder.binned = "binned_data", binsizes = 1e+06,
  reads.per.bin = NULL, bins = NULL, variable.width.reference = NULL,
  save.as.RData = FALSE, calc.complexity = TRUE, call = match.call(),
  reads.store = FALSE, outputfolder.reads = "data", reads.return = FALSE,
  reads.overwrite = FALSE, reads.only = FALSE, use.bamsignals = FALSE)

Arguments

file

A file with aligned reads. Alternatively a GRanges with aligned reads if format is set to 'GRanges'.

assembly

Please see fetchExtendedChromInfoFromUCSC for available assemblies. Only necessary when importing BED files. BAM files are handled automatically. Alternatively a data.frame with columns 'chromosome' and 'length'.

ID

An identifier that will be used to identify the file throughout the workflow and in plotting.

bamindex

BAM index file. Can be specified without the .bai ending. If the index file does not exist it will be created and a warning is issued.

chromosomes

If only a subset of the chromosomes should be binned, specify them here.

pairedEndReads

Set to TRUE if you have paired-end reads in your BAM files (not implemented for BED files).

min.mapq

Minimum mapping quality when importing from BAM files. Set min.mapq=NULL to keep all reads.

remove.duplicate.reads

A logical indicating whether or not duplicate reads should be removed.

max.fragment.width

Maximum allowed fragment length. This is to filter out erroneously wrong fragments due to mapping errors of paired end reads.

blacklist

A GRanges or a bed(.gz) file with blacklisted regions. Reads falling into those regions will be discarded.

outputfolder.binned

Folder to which the binned data will be saved. If the specified folder does not exist, it will be created.

binsizes

An integer vector with bin sizes. If more than one value is given, output files will be produced for each bin size.

reads.per.bin

Approximate number of desired reads per bin. The bin size will be selected accordingly. Output files are produced for each value.

bins

A named list with GRanges containing precalculated bins produced by fixedWidthBins or variableWidthBins. Names must correspond to the binsize.

variable.width.reference

A BAM file that is used as reference to produce variable width bins. See variableWidthBins for details.

save.as.RData

If set to FALSE, no output file will be written. Instead, a GenomicRanges object containing the binned data will be returned. Only the first binsize will be processed in this case.

calc.complexity

A logical indicating whether or not to estimate library complexity.

call

The match.call() of the parent function.

reads.store

If TRUE processed read fragments will be saved to file. Reads are processed according to min.mapq and remove.duplicate.reads. Paired end reads are coerced to single end fragments. Will be ignored if use.bamsignals=TRUE.

outputfolder.reads

Folder to which the read fragments will be saved. If the specified folder does not exist, it will be created.

reads.return

If TRUE no binning is done and instead, read fragments from the input file are returned in GRanges format.

reads.overwrite

Whether or not an existing file with read fragments should be overwritten.

reads.only

If TRUE only read fragments are stored and/or returned and no binning is done.

use.bamsignals

If TRUE the bamsignals package will be used for binning. This gives a tremendous performance increase for the binning step. reads.store and calc.complexity will be set to FALSE in this case.

Details

Convert aligned reads from .bam or .bed(.gz) files into read counts in equidistant windows (bins). This function uses countOverlaps to calculate the read counts.

Value

The function produces a list() of GRanges objects with one meta data column 'reads' that contains the read count. This binned data will be either written to file (save.as.RData=FALSE) or given as return value (save.as.RData=FALSE).

See Also

binning

Examples

1
2
3
4
5
6
## Get an example BED file with single-cell-sequencing reads
bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData")
## Bin the BED file into bin size 1Mb
binned <- binReads(bedfile, assembly='mm10', binsize=1e6,
                  chromosomes=c(1:19,'X','Y'))
print(binned)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.