filter_model_input: Filter Model Input

Description Usage Arguments Value Examples

View source: R/filter_model_input.R

Description

Filter variants to remove flagged alleles, polymorphisms, cosmic mutations, and high VAF prior to error model generation. In addition to these filters this function also detects and removes contextual outliers, that is non-reference alleles with an exceptionally high read count compared to the second most abundant non-reference allele.

Usage

1
2
3
4
5
6
7
8
filter_model_input(
  model_input,
  flagged_alleles = NA,
  MAF_cutoff = 0.001,
  VAF_cutoff = 0.05,
  MAPQ_cutoff = 59,
  recurrent_mutations = NA
)

Arguments

model_input

VRanges object annotated with mutation context and population minor allele frequency

flagged_alleles

VRanges object with high VAF alleles flagged as being present in too many samples

MAF_cutoff

Population Minor Allele Frequency cutoff: variants at or above this cutoff are excluded. Default is 0.001. This is to exclude polymorphisms results from germline mutations or sample-to-sample contamination from our error model

VAF_cutoff

Sample Variant Allele Frequency cutoff: variants at or above this cutoff are excluded. Default is 0.05. This is to exclude obvious somatic mutations or private germline mutations from our error model

MAPQ_cutoff

Minimum acceptable MAPQ score; positions below this cutoff will be excluded. Default is 59

recurrent_mutations

VRanges object with chr, pos ref, alt of frequently mutated alleles to remove from model input. GRanges objects are also accepted, in which case filtering will occur by position.

Value

This function returns a filtered VRanges object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Not run: 
# Get flagged alleles and cosmic mutations
hemeCOSMIC_10 <- load_recurrent_mutations("example_data/COSMIC_heme_freq10.txt", genome = "hg19")
flagged_alleles <- get_flagged_alleles(all_sample_names, all_sample_paths,
    exclude_cosmic_mutations = TRUE, cosmic_mutations = heme_COSMIC, cosmic_mut_frequency = 3)

# Load and annotate sample
samp <- load_as_VRanges(sample_name = "pt123",
    sample_path = "./patient_123_pileup2cns", genome = "hg19", metadata = TRUE)
samp <- sequence_context(samp)
library(MafDb.gnomADex.r2.1.hs37d5)
annotated_samp <- annotate_MAF(varscan_output = variants,
    MAF_database = MafDb.gnomADex.r2.1.hs37d5, genome = "hg19")

# Filter model input
samp_model_input <- filter_model_input(model_input = annotated_samp,
    flagged_alleles = flagged_alleles, recurrent_mutations = hemeCOSMIC_10)

## End(Not run)

andygxzeng/ECSI documentation built on Feb. 6, 2021, 8:53 a.m.