knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)

mtAnalysis

Comprehensive annotation to mtDNA mutations.

Authors: Xianbang Sun (maintainer, sxb3000@bu.edu), Katia Bulekova, Xue Liu, Meng Lai, Chunyu Liu
Date: "r format(Sys.Date(), '%m/%d/%Y')"

mtAnalysis is an R package for identifying and annotating mtDNA sequence variations. This package allows to identify mtDNA sequence variations with user-specified thresholds, and provide several plots to visualize mtDNA sequence variations. It also annotates all mtDNA variations and predicts functionality for mtDNA variants in the coding and tRNA regions.

Note: if you use mtAnalysis in published research, please cite:

Xianbang Sun, Katia Bulekova, Kaiyu Yan, Daniel Levy, Jun Ding, Chunyu Liu, Jessica L. Fetterman (2020) mtAnalysis: an R package for comprehensive annotation of mtDNA sequence variation and association analysis

Installation

You can install the development version of mtAnalysis package from GitHub with:

``` {r install, results="hide"}

install.packages("devtools")

devtools::install_github("mtDNA-BU/mtAnalysis")

library(mtAnalysis)

In addition, our package requires the seqMeta package. The instructions for the 
installation of the seqMeta package can be found at 
https://github.com/DavisBrian/seqMeta

## Standard workflow

Import the original allele, frequency and coverage datasets
```r
input_path = "/path/to/input/directory/"
input_path = "/restricted/projectnb/mtdna-alcohol/Sun_Xianbang/ARIC/"
allele <- read.csv(paste0(input_path, "allele/allele_ANNOmtDNA.csv"), header = T, stringsAsFactors=FALSE,
                   colClasses = c("character"))
freq <- read.csv(paste0(input_path, "freq/freq_ANNOmtDNA.csv"), header=T, stringsAsFactors=FALSE,
                      colClasses = c("character"))
coverage <- read.csv(paste0(input_path, "coverage/coverage_ANNOmtDNA.csv"), header=T)
allele <- as.matrix(allele)
freq <- as.matrix(freq)
coverage <- data.matrix(coverage)

allele: a character matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the alleles of each subject at each locus. The matrix must contain subject ID as the column names. "/" is used to delimited different allele calls in a locus.

freq: a character matrix (16569 x N) provided by the user. Rows correspond to loci and columns correspond to subjects. This matrix contains the allele fractions of the corresponding allele matrix. The matrix must contain subject ID as the column names. "/" is used to delimited the allele fractions.

coverage: a numeric matrix (16569 x N). Rows correspond to loci and columns correspond to subjects. This matrix contains the reads coverage of the 16569 mtDNA loci for each subject. The matrix must contain the subject ID as the column names.

Compute alternative allele fraction (AAF) by mtAAF function

method is the argument to choose method to compute AAF for the case of multiple alternative alleles. The default method "maxAA" computes AAF as the maximum of frequencies of corresponding alternative alleles; and the alternative method "allAA" computes AAF as 1 minus the frequency of reference allele

AAF <- mtAAF(allele, freq, method = "maxAA")

Scatter plot of output of mtAAF function, each point is AAF of a subject at a locus, the x axis is the mtDNA loci and the y axis is the range of AAF (0-100%)

plot(AAF)

Plot the mean coverage of each locus by plotCover function, the x axis is the mtDNA loci and y axis is the mean coverage

plotCover(coverage, type="mean")

Histogram of the median coverage of each subject by histSampCov function

histSampCov(coverage, ylim=c(0,1200))

Summarize and annotate mtDNA mutations by mtSummary function

Specify the path to output the annotation file and histograms of mtDNA mutation burden of subjects and number of mutations carried by each loci for heteroplasmic and homoplasmic mutations

Users can specify lower bound and upper bound of the threshold by thre.lower and thre.upper arguments. By default, thre.lower=0.03 and thre.upper=0.97. That is, if $0.03\leq AAF\leq 0.97$, it is a heteroplasmic mutation; and if $AAF>0.97$, it is a homoplasmic mutation. Users can also choose different gene regions by loci argument. For example, to choose tRNA region, set loci="tRNA". Users can also choose types of mutations to be annotated by type argument. For example, to choose heteroplasmic mutations, set type="heter", and choose homoplasmic mutations, set type="homo". Users can also specify the types of comprehensive predicted functional scores and categories by annot.select argument. The default is c("Pos", "ref", "Gene", "TypeMutation","MissensMutation", "CodonPosition", "ProteinDomain", "dbSNP_150_id", "PolyPhen2", "PolyPhen2_score", "SIFT", "SIFT_score", "CADD", "CADD_score", "CADD_phred_score")

The types of comprehensive predicted functional scores and categories to choose are: TypeMutation, MissensMutation, CodonPosition, ProteinDomain, mFOLD_dG, mFOLD_Initial, mFOLD_rCRS DG, mFOLD_rCRS Initial, mFOLD_AnticodonAminoAcidChange, mFOLD_Location, PolyPhen2, PolyPhen2_score, SIFT, SIFT_score, PROVEAN, PROVEAN_score, MutationAssessor, MutationAssessor_score, CADD, CADD_score, CADD_phred_score, PANTHER, PANTHER_score, PhD_SNP, PhD_SNP_score, SNAP, SNAP_score, MutationTaster, MutationTaster_score, dbSNP_150_id

Run the mtSummary function, only include loci of coding region, annotate for both of heteroplasmic and homoplasmic mutations.

output_path <- "/output/dir/"
# output_path <-"/restricted/projectnb/mtdna-alcohol/Sun_Xianbang/Annotation/Output/"
output_path <- paste0(tempdir(),"/")
mtSum <- mtSummary(aaf=AAF, allele=allele, freq=freq, coverage=coverage,
         loci="coding", path=output_path, type="both", study="ARIC")

Part of the output of annotated alleles

annotate_df <- read.csv(paste0(output_path, "/ARIC_annotation_variation.csv"))
annotate_df[annotate_df$mtID == 3310, ]

Output of histograms of summarized mutations

options(width = 60)
local({
  hook_output <- knitr::knit_hooks$get('output')
  knitr::knit_hooks$set(output = function(x, options) {
    if (!is.null(options$max.height)) options$attr.output <- c(
      options$attr.output,
      sprintf('style="max-height: %s;"', options$max.height)
    )
    hook_output(x, options)
  })
})
library(grImport2)
library(rsvg)
knitr::include_graphics('/restricted/projectnb/mtdna-alcohol/Sun_Xianbang/Annotation/Output/hist_test.svg')
#img_test<-readPicture("/restricted/projectnb/mtdna-alcohol/Sun_Xianbang/Annotation/Output/hist_test.svg")
#grid.picture(img_test, ext="clipbbox" )
img_test<-rsvg("/restricted/projectnb/mtdna-alcohol/Sun_Xianbang/Annotation/Output/hist_test.svg")
#plot(img_test)
#display(img_test, method = "raster", all=T)
library(pdftools)
pdf_pages <- pdf_info(paste0(output_path, "mtHistograms.pdf"))$pages
filenames<-paste0(output_path, "mtHistograms_", seq_len(pdf_pages), ".png")
pdf_convert(paste0(output_path, "mtHistograms.pdf"), format = "png", pages=seq_len(pdf_pages), filenames=filenames)
library(EBImage)
for(i in seq_along(filenames)){
img <- readImage(filenames[i])
display(img, method = "raster")
}

Summary of the mean coverage of loci

mtSum$coverLoci 

Summary of the mean coverage of subjects

mtSum$coverSubjects  

Display loci of variation

mtSum$loci_var  

Summary of the heteroplasmic burden of subjects

mtSum$heter_burden_sum  

Summary of numbers of heteroplasmic mutations loci carried

mtSum$heter_loci_sum  

Display loci of heteroplasmy

mtSum$loci_heter  

Total number of heteroplasmic mutations

mtSum$heter_total  

Summary of the homoplasmic burden of subjects

mtSum$homo_burden_sum  

Summary of numbers of homoplasmic mutations loci carried

mtSum$homo_loci_sum  

Display loci of homoplasmy

mtSum$loci_homo  

Total number of homoplasmic mutations

mtSum$homo_total  

Annotate alternative alleles by mtAnno function

Generate alternative alleles to be annotated for mtDNA loci. It has two columns: loci positions ("pos") and "alleles" to be annotated.

anno <- as.data.frame(matrix(0, 10, 2))
colnames(anno)<- c("pos", "alleles")
anno$pos <- c(3311:3320)
anno$alleles <- c("A", "A", "T", "C", "A", "T", "G", "A", "C", "T")

Run the mtAnno function

mtAnno(anno=anno, path=output_path)

Part of the output of annotated alleles

annotate_df2 <- read.csv(paste0(output_path, "/Study_mtAnno.csv"))
annotate_df2[annotate_df2$Pos == 3311, ]


mtDNA-BU/mtdnaANNO documentation built on Aug. 11, 2022, 10:57 a.m.