pileupCall: Call bases from pileup file

Description Usage Arguments Value Note Author(s) See Also

Description

Reads a pileup formatted file (pileupCallFile) or all pileup files in a folder (pileupCallRun) created by samtools mpileup and calls bases for each chromosome listed. Base calling is controlled by coverage and frequency parameters as described in Notes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
pileupCallRun(
  min.cov.call,
  min.cov.freq,
  min.base.freq,
  min.ins.freq,
  min.prob.freq,
  min.binom.prob,
  folder = ".",
  pattern = "\\.pileup$",
  label = NULL,
  num.cores = NULL
)

pileupCallFile(
  fname,
  min.cov.call,
  min.cov.freq,
  min.base.freq,
  min.ins.freq,
  num.cores = NULL
)

Arguments

min.cov.call

minimum coverage for base calling. Sites with coverage below this are assigned N's.

min.cov.freq

minimum coverage above which min.cov.freq is applied. Sites below this value and >= than min.cov.call will only be called if all reads agree.

min.base.freq

minimum frequency of either the reference or alternate base for calling. If both bases are below this frequency, an N is assigned.

min.ins.freq

minimum frequency of insertion.

min.prob.freq

minimum frequency for binomial probability.

min.binom.prob

minimum probability from binomial distribution.

folder

folder containing pileup files from a run

pattern

text pattern for pileup files. The default is that the file ends in ".pileup".

label

label for run output files.

num.cores

number of cores to use during processing. If NULL, will default to parallel::detectCores() - 1.

fname

filename of pileup file

Value

list with the following elements:

cons.seq a DNAbin format list of sequences.
plp data frame of reference, consensus base, and base frequencies at each reference position.

Note

The input pileup file should be the result of a call to samtools mpileup on a single BAM file.

For each position, bases are called according to the following logic within a single pileup file using pileupCallFile():

  1. If coverage is < min.cov.call, assign N.

  2. If min.cov.call <= coverage < min.cov.freq, assign N unless all reads contain the same base.

  3. If coverage >= min.cov.freq, then assign N unless a base occurs at frequency > min.base.freq.

  4. When a set of pileup files are processed together using pileupCallRun(), an additional step is considered. For positions that were designated N based on condition 3 above, a base may be called if 1) the pooled frequency (pool.prop) for that base is > 0.5 and the frequency for the individual (read.prop) is > pool.prop, or 2) pool.prop <= 0.5, the binomial probability of that base (given the coverage at that site) is > min.binom.prob, and read.prop is above a line defined by ((1 - (min.prob.freq / 0.5)) * pool.prop) + min.prob.freq.

The above numbers are used as the value of the n.code column in the output plp data frame to identify the reason an N was called at a given position.

Author(s)

Eric Archer eric.archer@noaa.gov

See Also

pileupRead


EricArcher/swfscGenetics documentation built on May 25, 2021, 3:46 a.m.