fullCoverage: Load the unfiltered coverage information from a group of BAM...

View source: R/fullCoverage.R

fullCoverageR Documentation

Load the unfiltered coverage information from a group of BAM files and a list of chromosomes


For a group of samples this function reads the coverage information for several chromosomes directly from the BAM files. Per chromosome, it merges the unfiltered coverage by sample into a DataFrame. The end result is a list with one such DataFrame objects per chromosome.


  bai = NULL,
  chrlens = NULL,
  outputs = NULL,
  cutoff = NULL,



A character vector with the full path to the sample BAM files (or BigWig files). The names are used for the column names of the DataFrame. Check rawFiles for constructing files. files can also be a BamFileList object created with BamFileList or a BigWigFileList object created with BigWigFileList.


The chromosome of the files to read. The format has to match the one used in the input files.


The full path to the BAM index files. If NULL it is assumed that the BAM index files are in the same location as the BAM files and that they have the .bai extension. Ignored if files is a BamFileList object.


The chromosome lengths in base pairs. If it's NULL, the chromosome length is extracted from the BAM files. Otherwise, it should have the same length as chrs.


This argument is passed to the output argument of loadCoverage. If NULL or 'auto' it is then recycled.


This argument is passed to filterData.


Arguments passed to other methods and/or advanced arguments. Advanced arguments:


If TRUE basic status updates will be printed along the way.


How many cores to use for reading the chromosome information. There's no benefit of using a number greater than the number of chromosomes. Also note that your harddisk speed will limit how much you get from using a higher mc.cores value.


Controls the number of cores to be used per chr for loading the data which is only useful in the scenario that you are loading in genome tiles. If not supplied, it uses mc.cores for loadCoverage. Default: 1. If you are using genome tiles, the total number of cores you'll use will be mc.cores times mc.cores.load.

Passed to loadCoverage, define_cluster and extendedMapSeqlevels. Note that filterData is used internally by loadCoverage (and hence fullCoverage) and has the important arguments totalMapped and targetSize which can be used to normalize the coverage by library size. See getTotalMapped for calculating totalMapped.


A list with one element per chromosome.

Each element is a DataFrame with the coverage information produced by loadCoverage.


Leonardo Collado-Torres

See Also

loadCoverage, filterData, getTotalMapped


datadir <- system.file("extdata", "genomeData", package = "derfinder")
files <- rawFiles(
    datadir = datadir, samplepatt = "*accepted_hits.bam$",
    fileterm = NULL
## Shorten the column names
names(files) <- gsub("_accepted_hits.bam", "", names(files))

## Read and filter the data, only for 1 file
fullCov <- fullCoverage(files = files[1], chrs = c("21", "22"))
## Not run: 
## You can then use filterData() to filter the data if you want to.
## Use bplapply() if you want to do so with multiple cores as shown below.
p <- SnowParam(2L)
bplapply(fullCov, function(x) {
    filterData(x, cutoff = 0)
}, BPPARAM = p)

## End(Not run)

lcolladotor/derfinder documentation built on May 4, 2024, 5:38 p.m.