scan_spiked_bam: pretty much what it says: scan standard chroms + spike...

View source: R/scan_spiked_bam.R

scan_spiked_bamR Documentation

pretty much what it says: scan standard chroms + spike contigs from a BAM

Description

Note: behind the scenes, this is being refactored into scan_spike_contigs and scan_genomic_contigs. Once that is done, perhaps before release, the default workflow will switch to

Usage

scan_spiked_bam(
  bam,
  spike,
  mapq = 20,
  binwidth = 300L,
  bins = NULL,
  how = c("max", "mean"),
  dupe = FALSE,
  paired = TRUE,
  standard = TRUE,
  ...
)

Arguments

bam

the BAM file

spike

the spike-in reference database (e.g. data(spike))

mapq

minimum mapq value to count a pair (20)

binwidth

width of the bins for chromosomal tiling (300)

bins

a pre-tiled GRanges for binning coverage (NULL)

how

how to record spike read coverage (max or mean)? (max)

dupe

unique (FALSE), duplicte (TRUE), or all (NA) reads? (FALSE)

paired

restrict coverage to that from properly paired reads? (TRUE)

standard

restrict non-spike contigs to "standard" chromosomes? (TRUE)

...

additional arguments to pass to scanBamFlag()

Details

  1. scan spike contigs and count fragments per contig or per bin.

  2. fit the appropriate model for adjusting genomic contigs based on spikes.

  3. scan and adjust binned fragment tallies along genomic contigs per above.

This approach decouples binning schemes from model generation (using spikes) and model-based adjustment (using genomic fragment counts), decreasing code complexity while increasing the opportunities for caching & parallelization.

For a more realistic example (not run), one might do something like:

data(spike, package="spiky"); bam <- "2021_ctl.hg38_withSpikes.bam"; ssb_res <- scan_spiked_bam(bam, mapq=20, spike=spike);

An extract from the resulting ssb_res object is available via

data(ssb_res, package="spiky");

The full ssb_res is a GRangesList object with 300bp-binned coverage on the standard (chr1-22, chrX, chrY, chrM) chromosomes (as determined by the GenomeInfoDb::standardChromosomes() function against the assembly defined in the BAM or CRAM file, by default; if desired, a user can scan all genomic contigs by setting standard=FALSE when calling the function). By default, the mean base-level coverage of genomic bins is reported, and the maximum spike-level coverage is reported, though this can also be adjusted as needed. The results then inform the reliability of measurements from replicate samples in multiple labs, as well as the adjusted quantitative coverage in each bin once the absolute quantity of captured cell-free methylated DNA has been fit by model_glm_pmol and predict_pmol. In some sense, this function converts BAMs/CRAMs into usable data structures for high-throughput standardized cfMeDIP experiments.

The data extract used in other examples is the same as the full version, with the sole difference being that genomic bins are limited to chr22.

Value

     a CompressedGRangesList with bin- and spike-level coverage

See Also

    GenomeInfoDb::keepStandardChromosomes
    Rsamtools::ScanBamParam

Examples

library(GenomicRanges)
data(spike, package="spiky")
sb <- system.file("extdata", "example.spike.bam", package="spiky",
                  mustWork=TRUE)
res <- scan_spiked_bam(sb, spike=spike, bins=GRanges())
summary(res$spikes$coverage)


trichelab/spiky documentation built on Sept. 17, 2022, 8:44 a.m.