forestDNM is an R package built around a classifier that was trained to predict true de novo germline mutations (DNMs), using features derived from family genotype data contained in a VCF. The classifier was trained on 10 families with monozygotic twins, whose putative DNMs had undergone extensive experimental validation (the classifier was trained to predict validation status). In an independent test set of held-out data from the 10 families, sensitivity was > 95% while maintaining an FDR < 10%. The balance between sensitivity and FDR can be tuned by adjusting the threshold used on the classifier score (RF vote proportion). The default cutoff is 0.2, and plots showing how FDR and sensitivity vary with classifier score are included in the vignette. SNVs in this training/test set were genotyped using GATK 2.1-13 (unified genotyper and VQSR).
|Author||Jacob J. Michaelson|
|Maintainer||Jake Michaelson <[email protected]>|
|Package repository||View on GitHub|
Install the latest version of this package by entering the following in R:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.