create_background_panel: Creates a background panel from a list of bam files

Description Usage Arguments Details Value See Also Examples

View source: R/create_background_panel.R

Description

This function scans the targets regions in the list of bam files, and reports the number of reference, non-reference reads for each loci in addition to the non-reference (VAF) allele frequency. Loci with VAF higher than vaf_threshold are masked with NA.

Usage

1
2
3
4
create_background_panel(bam_list, targets, reference,
  vaf_threshold = 0.05, bam_list_tags = rep("", length(bam_list)),
  min_base_quality = 10, max_depth = 1e+05, min_mapq = 20,
  substitution_specific = TRUE)

Arguments

bam_list

A character vector of paths to bam files.

targets

The targets data frame must have the columns chr, start and end.

reference

The reference genome as BSgenome object.

vaf_threshold

Loci with the fraction of non-reference reads above this value are masked with NA.

bam_list_tags

RG tags for the list of bam files. By default, the whole bam file will be used.

min_base_quality

The minimum base quality to count a read for a loci.

max_depth

Maximum depth for the pileup

min_mapq

The minimum mapping quality to count a read for a loci

substitution_specific

logical, whether to have the loci by substitutions.

Details

Extracts the depth, variant allele counts and variant allele frequency (VAF) for each genomic position in the input targets across a panel of bam files (e.g. from healthy samples to represent only technical noise). The extracted information can be fed to create_black_list in order to extract a black listed loci according to defined criteria

The function support two modes, either loci-specific regardless of the basepair substitution, or substitution-specific where each substitution class (e.g. C>T, C>G) are quantified separately. This behavior is controlled by the substitution_specific parameter.

VAF above vaf_threshold parameters are masked with NA, to exclude real SNPs/mutations.

Since this function can take a long time when the bam_list comprises a large number of bam files, the function supports multi-threading using the furrr and future R packages. All you need to do is call 'plan(multiprocess)' or other multi-threading strategies before calling this function.

Value

A named list having depth, alt and vaf data frames. Each has the same order of loci in rows and the input samples in columns.

See Also

create_black_list test_ctDNA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Load example data
data("targets", package = "ctDNAtools")
bamN1 <- system.file("extdata", "N1.bam", package = "ctDNAtools")
bamN2 <- system.file("extdata", "N2.bam", package = "ctDNAtools")
bamN3 <- system.file("extdata", "N3.bam", package = "ctDNAtools")

## Use human reference genome from BSgenome.Hsapiens.UCSC.hg19 library
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg19))

## Use a black list based on loci
bg_panel <- create_background_panel(
  bam_list = c(bamN1, bamN2, bamN3),
  targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
  substitution_specific = FALSE
)

bl1 <- create_black_list(bg_panel,
  mean_vaf_quantile = 0.99,
  min_samples_one_read = 2, min_samples_two_reads = 1
)

## Use a substitution-specific black list
bg_panel <- create_background_panel(
  bam_list = c(bamN1, bamN2, bamN3),
  targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
  substitution_specific = TRUE
)

bl2 <- create_black_list(bg_panel,
  mean_vaf_quantile = 0.99,
  min_samples_one_read = 2, min_samples_two_reads = 1
)

## Multi-threading (commented)
## library(furrr)
## plan(multiprocess)
## plan(multiprocess, workers = 3)
bg_panel <- create_background_panel(
  bam_list = c(bamN1, bamN2, bamN3),
  targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
  substitution_specific = TRUE
)

ctDNAtools documentation built on March 26, 2020, 7:39 p.m.