README.md

Build Status R build status Lifecycle: maturing DOI

MORFEE

MORFEE (Mutation on Open Reading FramE annotation) is a tool (R package) that, from a VCF file, detects and annotates single nucleotide variants creating premature ATG codons.

MORFEE algorithm is written in R language and can run on all operating systems which have an R interpreter (including Linux, macOS and Windows). MORFEE starts with a minimal VCF file (i.e. with at least the chr, position, reference allele and alternate allele fields) as an input, but can also work with already ANNOVAR-annotated (Wang et al., 2010) VCF files. MORFEE has some R packages dependencies which are available on CRAN or Bioconductor repositories and uses the GENCODE database.

When using MORFEE, please cite our publication.

Installation

From Bioconductor

Not yet available!

From GitHub

library("devtools")
install_github("daissi/MORFEE")

Usage

In a first step, MORFEE reads the input VCF file and use ANNOVAR (that has then to be beforehand installed) through the wrapper function vcf.annovar.annotation to annotate all variants. This step is skipped if the input file has already been annotated. The minimal ANNOVAR annotations required by MORFEE are: - Func.refGene to extract 5'UTR variants. - GeneDetail.refGene to extract transcript ID.

A. Annotation with ANNOVAR directly

# Annotation with ANNOVAR
table_annovar.pl \
        inst/extdata/VCF_example.vcf \
        ${PATH_ANNOVAR_DB} \
        -buildver hg19 \
        -remove -protocol refGene,gwasCatalog,avsnp150,clinvar_20190305,gnomad211_genome,dbnsfp35a \
        -operation gx,r,f,f,f,f -nastring . -vcfinput \
        -xref ${PATH_ANNOVAR_DB}/morfee.lof_metrics.by_gene.txt

B. Annotation with ANNOVAR through MORFEE

library(VariantAnnotation)
library(MORFEE)

my_raw_vcf <- "inst/extdata/VCF_example.vcf"

# Annotation with ANNOVAR through the wrapper function
vcf.annovar.annotation(my_raw_vcf)

Annotation with MORFEE

library("VariantAnnotation")
library("MORFEE")

# Download database and create an R object used by MORFEE
MORFEE_DATA <- get.morfee.data()

# To avoid too much Internet traffic, save locally this object
# save(MORFEE_DATA, file="MORFEE_DATA.RData")
# load("MORFEE_DATA.RData")

MY_VCF <- "VCF_example.hg19_multianno.vcf"

# Read/Load the VCF into R
my_vcf <- readVcf(MY_VCF)

# Perform the MORFEE annotation
my_vcf_morfee <- morfee.annotation(my_vcf, MORFEE_DATA)

# Keep only variants which create a new upstream ATG sequence
my_vcf_morfee_NEW <- my_vcf_morfee[!is.na(info(my_vcf_morfee)$MORFEE)]

# Write a VCF file with the new MORFEE field/annotation
writeVcf(my_vcf_morfee_NEW, filename="VCF_example.hg19_multianno.morfee_anno_NEW.vcf")

# Write an XLSX file with the new MORFEE field/annotation
vcf_2_xlsx(my_vcf_morfee_NEW, file="VCF_example.hg19_multianno.morfee_anno_NEW.xlsx")

Please see below the description of the different columns of the xlsx file: - GeneDetail.refGene: Sequence Variant Nomenclature (ANNOVAR output). Please note, this nomenclature does not respect the HGVS Recommendations (doi:10.1002/humu.22981). The position of the variant used with a prefix c. should be the position of the variant regarding the ATG position based one the coding sequence. In other words, the position on a sequence WITHOUT introns. ANNOVAR reports the position on the sequence WITH introns. - orfSNVs_type: uATG or uSTOP to determine whether there is a creation of an upstream ATG or a deletion of an upstream STOP. - orfSNVs_frame: in_frame or out_of_frame to determine whether there is a frame shift between the new ATG (or deleted STOP) and the reference ATG. - orfSNVs_position: overlapping, not_overlapping or elongated_CDS to determine the position of the predicted ORF to the reference protein. - Ratio_length_pred_obs: ratio between the predicted vs the observed length of the protein (<1 for a shorter protein and >1 for a longer protein). In case of several transcripts, only one value is given. That corresponding to the predicted length the closest to the observed length, i.e the ratio closest to 1. - NewAALength: predicted length of the new protein in amino acids. - MORFEE_ATG: MORFEE VCF-style annotation for variants creating new uATG. - MORFEE_STOP: MORFEE VCF-style annotation for variants deleting uSTOP.

All fields below come from ANNOVAR annotation:



daissi/MORFEE documentation built on Jan. 9, 2021, 10:52 a.m.