mutFilterCan: mutFilterCan

View source: R/mutFilterCan.R

mutFilterCanR Documentation

mutFilterCan

Description

Apply common filtering strategies on a MAF data frame for different cancer types.

Usage

mutFilterCan(
  maf,
  cancerType,
  PONfile,
  PONformat = "vcf",
  panel = "Customized",
  tumorDP = 0,
  normalDP = 0,
  tumorAD = 0,
  normalAD = Inf,
  VAF = 0,
  VAFratio = 0,
  SBmethod = "SOR",
  SBscore = Inf,
  maxIndelLen = Inf,
  minInterval = 0,
  tagFILTER = NULL,
  dbVAF = 0.01,
  ExAC = FALSE,
  Genomesprojects1000 = FALSE,
  ESP6500 = FALSE,
  gnomAD = FALSE,
  dbSNP = FALSE,
  keepCOSMIC = FALSE,
  keepType = "all",
  bedFile = NULL,
  bedFilter = TRUE,
  bedHeader = FALSE,
  mutFilter = FALSE,
  selectCols = FALSE,
  report = TRUE,
  reportFile = "FilterReport.html",
  reportDir = "./",
  TMB = FALSE,
  progressbar = TRUE,
  codelog = FALSE,
  codelogFile = "mutFilterCan.log",
  verbose = TRUE
)

Arguments

maf

An MAF data frame.

cancerType

Type of cancer whose filtering parameters need to be referred to. Options are: "COADREAD", "BRCA", "LIHC", "LAML", "LCML", "UCEC", "UCS", "BLCA", "KIRC" and "KIRP"

PONfile

Panel-of-Normals files, which can be either obtained through GATK (https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON-) or generated by users. Should have at least four columns: CHROM, POS, REF, ALT

PONformat

The format of PON file, either "vcf" or "txt". Default: "vcf"

panel

The sequencing panel applied on the dataset. Parameters for mutFilterQual function are set differently for different panels. Default: "Customized". Options: "MSKCC", "WES".

tumorDP

Threshold of tumor total depth. Default: 20

normalDP

Threshold of normal total depth. Default: 10

tumorAD

Threshold of tumor alternative allele depth. Default:5

normalAD

Threshold of normal alternative allele depth. Default: Inf

VAF

Threshold of VAF value. Default: 0.05

VAFratio

Threshold of VAF ratio (tVAF/nVAF). Default: 0

SBmethod

Method will be used to detect strand bias, including 'SOR' and 'Fisher'. Default: 'SOR'. SOR: StrandOddsRatio (https://gatk.broadinstitute.org/hc/en-us/articles/360041849111- StrandOddsRatio)

SBscore

Cutoff strand bias score used to filter variants. Default: 3

maxIndelLen

Maximum length of indel accepted to be included. Default: 50

minInterval

Maximum length of interval between an SNV and an indel accepted to be included. Default: 10

tagFILTER

Variants with spcific tag in the FILTER column will be kept, Default: 'PASS'

dbVAF

Threshold of VAF of certain population for variants in database. Default: 0.01.

ExAC

Whether to filter variants listed in ExAC with VAF higher than cutoff(set in VAF parameter). Default: TRUE.

Genomesprojects1000

Whether to filter variants listed in Genomesprojects1000 with VAF higher than cutoff(set in VAF parameter). Default: TRUE.

ESP6500

Whether to filter variants listed in ESP6500 with VAF higher than cutoff(set in VAF parameter). Default: TRUE.

gnomAD

Whether to filter variants listed in gnomAD with VAF higher than cutoff(set in VAF parameter). Default: TRUE.

dbSNP

Whether to filter variants listed in dbSNP. Default: FALSE.

keepCOSMIC

Whether to keep variants in COSMIC even they have are present in germline database. Default: TRUE.

keepType

A group of variant classifications will be kept, including 'exonic', 'nonsynonymous' and 'all'. Default: 'exonic'.

bedFile

A file in bed format that contains region information. Default: NULL

bedFilter

Whether to filter the information in bed file or not, which only leaves segments in Chr1-Ch22, ChrX and ChrY. Default: TRUE

bedHeader

Whether the input bed file has a header or not. Default: FALSE.

mutFilter

Whether to directly return a filtered MAF data frame. If FALSE, a simulation filtration process will be run, and the original MAF data frame with tags in CaTag column, and a filter report will be returned. If TRUE, a filtered MAF data frame and a filter report will be generated. Default: FALSE

selectCols

Columns will be contained in the filtered data frame. By default (TRUE), the first 13 columns and 'Tumor_Sample_Barcode' column. Or a vector contains column names will be kept.

report

Whether to generate report automatically. Default: TRUE

reportFile

File name of the report. Default: 'FilterReport.html'

reportDir

Path to the output report file. Default: './'

TMB

Whether to calculate TMB. Default: TRUE

progressbar

Whether to show progress bar when running this function Default: TRUE

codelog

If TRUE, your code, along with the parameters you set, will be export in a log file. It will be convenient for users to repeat experiments. Default: FALSE

codelogFile

Where to store the codelog, only useful when codelog is set to TRUE. Default: "mutFilterCan.log"

verbose

Whether to generate message/notification during the filtration process. Default: TRUE.

Value

An MAF data frame after common strategy filtration for a cancer type.

A filter report in HTML format

Examples

maf <- vcfToMAF(system.file("extdata",
"WES_EA_T_1_mutect2.vep.vcf", package="CaMutQC"))
mafF <- mutFilterCan(maf, cancerType='BRCA', 
PONfile=system.file("extdata", "PON_test.txt", package="CaMutQC"), 
PONformat="txt", TMB=FALSE)

likelet/CaMutQC documentation built on Aug. 17, 2024, 4 a.m.