Description Usage Arguments Details Value See Also Examples
View source: R/create_background_panel.R
This function scans the targets regions in the list of bam files, and reports the number of reference, non-reference reads for each loci in addition to the non-reference (VAF) allele frequency. Loci with VAF higher than vaf_threshold are masked with NA.
1 2 3 4 | create_background_panel(bam_list, targets, reference,
vaf_threshold = 0.05, bam_list_tags = rep("", length(bam_list)),
min_base_quality = 10, max_depth = 1e+05, min_mapq = 20,
substitution_specific = TRUE)
|
bam_list |
A character vector of paths to bam files. |
targets |
The targets data frame must have the columns chr, start and end. |
reference |
The reference genome as BSgenome object. |
vaf_threshold |
Loci with the fraction of non-reference reads above this value are masked with NA. |
bam_list_tags |
RG tags for the list of bam files. By default, the whole bam file will be used. |
min_base_quality |
The minimum base quality to count a read for a loci. |
max_depth |
Maximum depth for the pileup |
min_mapq |
The minimum mapping quality to count a read for a loci |
substitution_specific |
logical, whether to have the loci by substitutions. |
Extracts the depth, variant allele counts and variant allele frequency (VAF) for each genomic position in the input targets
across a panel of bam files (e.g. from healthy samples to represent only technical noise). The extracted information can be fed to
create_black_list
in order to extract a black listed loci according to defined criteria
The function support two modes, either loci-specific regardless of the basepair substitution, or substitution-specific where each substitution class (e.g. C>T, C>G) are quantified separately. This behavior is controlled by the substitution_specific parameter.
VAF above vaf_threshold parameters are masked with NA, to exclude real SNPs/mutations.
Since this function can take a long time when the bam_list comprises a large number of bam files, the function supports multi-threading using the furrr and future R packages. All you need to do is call 'plan(multiprocess)' or other multi-threading strategies before calling this function.
A named list having depth, alt and vaf data frames. Each has the same order of loci in rows and the input samples in columns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | ## Load example data
data("targets", package = "ctDNAtools")
bamN1 <- system.file("extdata", "N1.bam", package = "ctDNAtools")
bamN2 <- system.file("extdata", "N2.bam", package = "ctDNAtools")
bamN3 <- system.file("extdata", "N3.bam", package = "ctDNAtools")
## Use human reference genome from BSgenome.Hsapiens.UCSC.hg19 library
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg19))
## Use a black list based on loci
bg_panel <- create_background_panel(
bam_list = c(bamN1, bamN2, bamN3),
targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
substitution_specific = FALSE
)
bl1 <- create_black_list(bg_panel,
mean_vaf_quantile = 0.99,
min_samples_one_read = 2, min_samples_two_reads = 1
)
## Use a substitution-specific black list
bg_panel <- create_background_panel(
bam_list = c(bamN1, bamN2, bamN3),
targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
substitution_specific = TRUE
)
bl2 <- create_black_list(bg_panel,
mean_vaf_quantile = 0.99,
min_samples_one_read = 2, min_samples_two_reads = 1
)
## Multi-threading (commented)
## library(furrr)
## plan(multiprocess)
## plan(multiprocess, workers = 3)
bg_panel <- create_background_panel(
bam_list = c(bamN1, bamN2, bamN3),
targets = targets, reference = BSgenome.Hsapiens.UCSC.hg19,
substitution_specific = TRUE
)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.