docs/analysis/filter-calls.md

description: Step 2 -- filtering

Filter Calls

The second step takes all the genotypes generated from the first step and organized into a patient level variants table with VAFs and call status for each variant of each sample.

Each call is subjected to:

  1. Read depth filter (hotspot vs non-hotspot)
  2. Systematic artifact filter
  3. Germline filters
  4. If any normal exist -- (buffy coat and DMP normal) 2:1 rule
  5. If not -- exac freq < 0.01% and VAF < 30%
  6. CH tag

Usage

Rscript R/filter_calls.R -h                                         
usage: R/filter_calls.R [-h] [-m MASTERREF] [-o RESULTSDIR] [-dmpk DMPKEYPATH]
                        [-ch CHLIST] [-c CRITERIA]

optional arguments:
  -h, --help            show this help message and exit
  -m MASTERREF, --masterref MASTERREF
                        File path to master reference file
  -o RESULTSDIR, --resultsdir RESULTSDIR
                        Output directory
  -ch CHLIST, --chlist CHLIST
                        List of signed out CH calls [default]
  -c CRITERIA, --criteria CRITERIA
                        Calling criteria [default]

Default

Default options can be found here

What filter_calls.R does

Generate a reference of systematic artifacts -- any call with occurrence in more than or equal to 2 donor samples (occurrence defined as more than or equal to 2 duplex reads)

{% hint style="info" %} We suggest that you filter out anything with duplex_support_num >= 2 {% endhint %}

For each patient

  1. Read in sample sheets -- reference for downstream analysis
  2. Generate a preliminary patient level variants table
  3. Read in and merging in hotspots, DMP signed out calls and occurrence in donor samples
  4. Call status annotation
  5. All call passing read depth/genotype filter annotated as 'Called' or 'Genotyped'
  6. Any call not satisfying germline filters are overwritten with 'Not Called'
    1. Calls with zero coverage in plasma sample also annotated as 'Not Covered'
  7. Final processing
  8. Combining duplex and simplex read counts
  9. CH tags
  10. Write out table

Example of the patient level table:

| Hugo_Symbol | Start_position | Variant_Classification | Other variant descriptions | ... | C-xxxxxx-L001-d___duplex.called | C-xxxxxx-L001-d___duplex.total | C-xxxxxx-L002-d___duplex.called | C-xxxxxx-L001-d___duplex.total | C-xxxxxx-N001-d___unfilterednormal | P-xxxxxxx-T01-IM6___DMP_Tumor | P-xxxxxxx-T01-IM6___DMP_Normal | | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | | KRAS | xxxxxx | Missense Mutation | ... | ... | Called | 15/1500(0.01) | Not Called | 0/1800(0) | 0/200(0) | 200/800(0.25) | 1/700(0.001) |



msk-access/access_data_analysis documentation built on Nov. 13, 2023, 12:43 p.m.