binMedians: Computes a median normalized coverage across samples for each...

View source: R/preprocess-utils.R

binMediansR Documentation

Computes a median normalized coverage across samples for each bin.

Description

This function reduces bin-to-bin variation in normalized coverage that is often correlated between samples.

Usage

binMedians(files, nchunks = 50)

Arguments

files

character string of bamfile paths

nchunks

The matrix of normalized coverage is potentially a very large matrix (bins x number samples). To reduce the required RAM, we can read subsets of this matrix. nchunks is an integer specifying how many subsets of the matrix are derived. Increasing the value of this parameter reduces the required RAM at the expense of increased computational time.

Examples

library(Rsamtools)
library(svbams)
library(svfilters.hg19)
data(bins, package="svbams")
bins <- head(bins, 100)
extdir <- system.file("extdata", package="svbams", mustWork=TRUE)
bamfile <- file.path(extdir, "cgov10t.bam")
## Assume we had 5 BAM files
bamfiles <- rep(bamfile, 5)
tempfiles <- replicate(length(bamfiles), tempfile())
for(i in seq_along(bamfiles)){
  bviews <- BamViews(bamRanges=bins, bamPaths=bamfiles[i])
  bins$cnt <- binnedCounts(bviews)
  std_cnt <- binNormalize(bins)
  bins$std_cnt <- std_cnt
  gc.adj <- binGCCorrect(bins)
  gc.adj.int <- as.integer(round(gc.adj*1000, 0))
  saveRDS(gc.adj.int, file=tempfiles[i])
}
binMedians(tempfiles, nchunks=1)

cancer-genomics/trellis documentation built on Aug. 20, 2024, 5:48 p.m.