FilterSNPs: Filter SNPs based on read depth and quality

Description Usage Arguments Value See Also Examples

Description

Use filtering paramaters to filter out high and low depth reads as well as low Genotype Quality as defined by GATK. All filters are optional but recommended.

Usage

1
2
filterSNPs(SNPset, refAlleleFreq, filterAroundMedianDepth, minTotalDepth,
  maxTotalDepth, minSampleDepth, depthDifference, minGQ, verbose = TRUE)

Arguments

SNPset

The data frame imported by ImportFromGATK

refAlleleFreq

A numeric < 1. This will filter out SNPs with a Reference Allele Frequency less than refAlleleFreq and greater than 1 - refAlleleFreq. Eg. refAlleleFreq = 0.3 will keep SNPs with 0.3 <= REF_FRQ <= 0.7

filterAroundMedianDepth

Filters total SNP read depth for both bulks. A median and median absolute deviation (MAD) of depth will be calculated. SNPs with read depth greater or less than filterAroundMedianDepth MADs away from the median will be filtered.

minTotalDepth

The minimum total read depth for a SNP (counting both bulks)

maxTotalDepth

The maximum total read depth for a SNP (counting both bulks)

minSampleDepth

The minimum read depth for a SNP in each bulk

depthDifference

The maximum absolute difference in read depth between the bulks.

minGQ

The minimum Genotype Quality as set by GATK. This is a measure of how confident GATK was with the assigned genotype (i.e. homozygous ref, heterozygous, homozygous alt). See What is a VCF and how should I interpret it?

verbose

logical. If TRUE will report number of SNPs filtered in each step.

Value

Returns a subset of the data frame supplied which meets the filtering conditions applied by the selected parameters. If verbose is TRUE the function reports the number of SNPs filtered in each step as well as the initiatl number of SNPs, the total number of SNPs filtered and the remaining number.

See Also

See mad for explaination of calculation of median absolute deviation. What is a VCF and how should I interpret it? for more information on GATK Fields and Genotype Fields

Examples

1
2
3
4
5
6
7
8
9
df_filt <- FilterSNPs(
    df,
    refAlleleFreq = 0.3,
    minTotalDepth = 40,
    maxTotalDepth = 80,
    minSampleDepth = 20,
    minGQ = 99,
    verbose = TRUE
)

bmansfeld/QTLseqr documentation built on Jan. 24, 2020, 3:56 p.m.