mbelmadani/forestDNM-archive: Predicts de novo SNVs when provided a VCF containing variants of a genotyped family.

forestDNM is an R package built around a classifier that was trained to predict true de novo germline mutations (DNMs), using features derived from family genotype data contained in a VCF. The classifier was trained on 10 families with monozygotic twins, whose putative DNMs had undergone extensive experimental validation (the classifier was trained to predict validation status). In an independent test set of held-out data from the 10 families, sensitivity was > 95% while maintaining an FDR < 10%. The balance between sensitivity and FDR can be tuned by adjusting the threshold used on the classifier score (RF vote proportion). The default cutoff is 0.2, and plots showing how FDR and sensitivity vary with classifier score are included in the vignette. SNVs in this training/test set were genotyped using GATK 2.1-13 (unified genotyper and VQSR).

Getting started

Package details

AuthorJacob J. Michaelson
MaintainerJake Michaelson <[email protected]>
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
mbelmadani/forestDNM-archive documentation built on Dec. 8, 2017, 12:24 a.m.