svAF: Compute allele frequencies at germline heterozygous positions...

View source: R/baf-utils.R

svAFR Documentation

Compute allele frequencies at germline heterozygous positions Given a set of genomic coordinates this function identifies heterozygous positions in a bam file given by normalBam and retuns the minor allele frequency at these positions in a bam file specified by tumorBam.

Description

Compute allele frequencies at germline heterozygous positions Given a set of genomic coordinates this function identifies heterozygous positions in a bam file given by normalBam and retuns the minor allele frequency at these positions in a bam file specified by tumorBam.

Usage

svAF(
  normalBam,
  tumorBam,
  genome,
  positions,
  region,
  n = 50000,
  minCovNormal = 20,
  minCovTumor = 20,
  minMafNormal = 0.3,
  minMafTumor = 0,
  min_base_quality = 0
)

Arguments

normalBam

The path to a bam file

tumorBam

The path to a bam file

genome

The genome build of tumorBam and normalBam. Possible values are "hg38", "hg19", and "hg18".

positions

A GRanges object consisting of the genomic regions of interest. The default set of positions is the snps object from svfilters.hg38, svfilters.hg19 or svilters.hg18 depending on the user-specified genome argument. The snps object contains 1,000,000 positions that are frequently seen as heterozygous, therefore fewer positions are needed to find a sufficient number of heterozygous positions. The snps object contains more than enough positions for tumor ploidy/ploidy analysis on well-covered WGS bam files, although for bam files genertated from targeted capture-based sequencing data it is recommended to use all or most the dbsnp150_snps object in svfilters.hg38, svfilters.hg19 or svfilters.hg18 to find a sufficient number of heterozygous positions as these objects contain over 12,000,000 positions.

region

If region is specified only SNPs in position that fall in region will be used. This argument is useful for capture-based sequencing technologies (e.g. Whole-exome sequencing) where we tend to only have high enough coverage to accurately call SNPs in the targeted regions. By supplying a GRanges object containing the targeted regions this function avoids calculating coverage metrics and allele frequencies for low coverage off-target regions. Alternatively, users can provide the full path to a bed file in BED format.

n

The number of positions to use for pileup in normalBam. The snps objects in svfilters.hg38, svfilters.hg19 and svfilters.hg18 contain 1 million frequently heterozygous positions. By specifying the n argument a random sample of the positions object of length n will be used. For tumor purity/ploidy analysis 10,000 heterozygous positions well spread across the genome is typically plenty. The default value of n = 50000 is generally sufficient to achieve this on a 30X WGS bam file. For sequencing data from targeted capture protocols, it is recommended to use the full set of SNPs in dbsnp150_snp in svfilters.hg38, svfilters.hg19 or svfilters.hg18 to identify a suffient number of SNPs with high enough coverage.

minCovNormal

The minimum coverage of a position in normalBam to be considered.

minCovTumor

The minimum coverage of a position in tumorBam to be considered.

minMafNormal

The minimum minor allele frequency (MAF) in order to consider a position as heterozygous in normalBam.

minMafTumor

The minimum minor allele frequency (MAF) in order to consider a position as heterozygous in tumorBam. It is recommended to set this value to at least 0.05 when setting normalBam = NULL to avoid outputting homozygous positions.

min_base_quality

The minimum Phred score of a base for it to be counted

Details

If using this function to generate allele frequencies in a tumor sample at germline heterozygous positions identified in a matched normal sample then tumorBam should point to the bam file for the tumor sample and normalBam should point to the bam file for its matched normal.

Value

A data.frame with the following columns:

Chrom: The chromosome of the event
Pos: The coordinate of the event
RefBase: The base in the reference genome (corresponds to the refUCSC column in dbSNP build 150)
AltBase: The other observed base
Normal.Mut.Count: The coverage of AltBase in normalBam
Normal.Coverage: The distinct coverage of RefBase + AltBase in normalBam
Tumor.Mut.Count: The coverage of AltBase in tumorBam
Tumor.Coverage: The distinct coverage of RefBase + AltBase in tumorBam
Tumor.MAF: The minor allele frequency of the event in tumorBam

Examples

extdir <- system.file("extdata", package="svbams")
bam1 <- file.path(extdir, "cgov10t.bam")
bam2 <- file.path(extdir, "cgov44t_revised.bam")

data(snps, package = "svfilters.hg19")
snps <- keepSeqlevels(snps, c("chr3", "chr5"), pruning.mode = "coarse")
## Not run: 
svAF(normalBam=bam1,
     tumorBam=bam2,
     genome="hg19",
     positions = snps,
     n = 1000,
     minCovNormal = 10,
     minCovTumor = 10,
     minMafNormal = 0.3,
     minMafTumor = 0)

## End(Not run)

cancer-genomics/trellis documentation built on Feb. 2, 2023, 7:04 p.m.