View source: R/variantAnnotation.R
Raw VCF files are annotated using Variant Effect Predictor (VEP) The VEP uses the coordinates and alleles in the VCF file to infer biological context for each variant including the location of each mutation, its biological consequence (frameshift/ silent mutation), and the affected genes. The following databases are used for VCF annotation:
Ensembl database: Protein-coding and non-coding genes, splice variants, cDNA and protein sequences, non-coding RNAs, among others annotations.
https://www.gencodegenes.org/GENCODE: human and mouse high accuracy based on biological evidence annotations.
https://www.ncbi.nlm.nih.gov/refseq/RefSeq: A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.
http://genetics.bwh.harvard.edu/pph2/PolyPhen: Polymorphism Phenotyping v2 is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.
https://sift.bii.a-star.edu.sg/SIFT: SIFT predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations.
https://www.ncbi.nlm.nih.gov/snp/dbSNP: dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.
https://cancer.sanger.ac.uk/cosmicCOSMIC: the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
http://www.hgmd.cf.ac.uk/ac/index.phpHGMD-PUBLIC: The Human Gene Mutation Database (HGMD®) represents an attempt to collate all known (published) gene lesions responsible for human inherited disease,
https://www.ncbi.nlm.nih.gov/clinvar/ClinVar: reports of the relationships among human variations and phenotypes, with supporting evidence.
1000 Genomes: database with most genetic variants with frequencies of at least 1
NHLBI-ESP
gnomAD: The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators,with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community.
1 | variantAnnotation(vcf, vep, cache_dir, ref, out_path, sample_name)
|
vcf |
File |
vep |
File |
ref |
Path for the reference genome to use for the alignment (fasta format) and the corresponding indexes generated with bwa index and a dictionary index file generated by CreateSequenceDictionary gatk tool. |
out_path |
Path where the output of the analysis will be saved. |
cache_file |
Nan |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.