knitr::opts_chunk$set(echo = TRUE, fig.width = 9, fig.height = 10)

Introduction

The mitochondrial genome is the circular genomic DNA contained in mitochondria, organelles located in the cytoplasm of eukaryotic cells.
In humans, the mitochondrial genome is 16569 bp long, and contains 37 genes, 13 of which are protein-coding genes. Most of the others are tRNA-coding genes, while 2 are rRNA-coding.

The recent advances in high-throughput DNA sequencing allow a thorough exploration and analysis of mitochondrial variants, which are positions in which nucleotides differ from the reference human mitochondrial genome.

While there are lots of tools available to analyse human mitochondrial genomes, visualisation of human mitochondrial variants usually relies on researchers' knowledge of general programming languages with visualisation capabilities. In order to fill this gap, I developed the mitovizR package, which offers several visualisation options right out of the box, with a simple user interface using the R programming language.


Installation

mitovizR can be installed using devtools::install_github(), as follows:

library(devtools)
install_github("robertopreste/mitovizR")

This will download and install the latest development version of mitovizR right from its GitHub repository. This means that some of its features can still be in beta, and might not function as expected.

After installation, mitovizR can be loaded into R using:

library(mitovizR)

Human mitochondrial variants can be obtained and explored from several different sources; the most common are dataframes, VCF files and JSON files. mitovizR offers specific functions to plot human mitochondrial variants from each one of these input sources.


Plot variants from a dataframe

Basic plots

Variants contained in an R dataframe can be plotted using the plot_df() function.
Supposing we have a dataframe sample_df with the following content:

load("../tests/testthat/sample_df.RData")
knitr::kable(sample_df)

We can plot its variants with plot_df(sample_df):

plot_df(sample_df)

This works right out of the box, because by default plot_df() looks for variants position in the POS column, variants reference allele in the REF column, variants alternate allele in the ALT column, sample names in the SAMPLE column, genotypes in the GT column and heteroplasmic fractions in the HF column.
Different column names can be specified using respectively pos_col, ref_col, alt_col, sample_col, gt_col and hf_col. In the case of the following dataframe sample_2_df column names are custom:

load("../tests/testthat/sample_2_df.RData")
knitr::kable(sample_2_df)

For the simplest plot, we only need to adjust the pos_col option:

plot_df(sample_2_df, pos_col = "position")

Heteroplasmic fractions

Heteroplasmic fractions (HF) are useful to evaluate the ratio at which a specific variant is found in the given sample; plot_df() can exploit this information to plot variants with HF = 1.0 on the outer border of the mitochondrial circle, those with HF = 0.0 on the inner border and all the others according to their actual HF value. This is done automatically if a column with heteroplasmic fraction values is present (by default HF, if not set differently using the hf_col option); otherwise, variants will simply be shown in the middle of the circular plot (as if they all had HF = 0.5 - see plot above).

plot_df(sample_2_df, pos_col = "position", hf_col = "hf_val")

Variant labels

We may want to show labels with variants position and nucleotide change, to better identify variants on the plot. This can be accomplished by setting the show_var_labels option to TRUE:

plot_df(sample_df, show_var_labels = TRUE)

When using a dataframe with custom column names, however, we need to also set the ref_col and alt_col accordingly:

plot_df(sample_2_df, show_var_labels = TRUE, 
        pos_col = "position", ref_col = "reference", alt_col = "alternate")

Please be aware that when using a dataframe with custom column names, the pos_col option must always be set if that column is not called POS, regardless of any other possible option.

Multiple samples

It may be the case that the dataframe contains variants coming from multiple samples, as in this one:

load("../tests/testthat/sample_multi_df.RData")
knitr::kable(sample_multi_df)

In this case, plot_df() will automatically recognise the two samples listed in the SAMPLE column and create two separate plots:

plot_df(sample_multi_df)

If the column listing samples has a different name, simply use the sample_col option and set it accordingly.

Plot details

By default, plot_df() will show the locus name above each mitochondrial locus, as well as a legend explaining colours used for different locus types (protein-coding, tRNA, rRNA, regulatory regions). Both these options can be disabled by setting respectively show_loci_names and show_loci_legend to FALSE:

plot_df(sample_df, show_loci_names = FALSE)
plot_df(sample_df, show_loci_legend = FALSE)

It is also possible to add a specific title to the resulting plot, using the title option, which accepts a string as its argument:

plot_df(sample_df, title = "My mitochondrial variants")

Saving plots

Plots generated with plot_df() can be saved to a PNG file using the save_plot = TRUE option, and a custom output location can be specified using the save_to option, which accepts a string with the path or filename where the plot will be saved. By default, it will be saved in the current working directory, using the filename mitoviz_plot.png.

# plot will be saved in ./mitoviz_plot.png
plot_df(sample_df, save_plot = TRUE)
# plot will be saved in ../my_dir/my_plot.png
plot_df(sample_df, save_plot = TRUE, save_to = "../my_dir/my_plot.png")

Please be aware that the save_to option only works when save_plot is set to TRUE, otherwise it will be ignored.


Plot variants from a VCF file

Basic plots

Variants contained in a VCF file can be plotted using the plot_vcf() function.
Supposing we have a classic VCF file sample_vcf.vcf with some mitochondrial variants, we can plot them with plot_vcf("sample_vcf.vcf"):

knitr::opts_knit$set(root.dir = '../tests/testthat/')
plot_vcf("sample_vcf.vcf")

By default plot_vcf() looks for standard VCF column names (e.g. variants position in the POS column, variants reference allele in the REF column, variants alternate allele in the ALT column).

Heteroplasmic fractions

Heteroplasmic fractions are useful to evaluate the ratio at which a specific variant is found in the given sample; plot_vcf() can exploit this information to plot variants with HF = 1.0 on the outer border of the mitochondrial circle, those with HF = 0.0 on the inner border and all the others according to their actual HF value. This is done automatically if an HF feature is specified in the FORMAT section of the VCF file; otherwise, variants will simply be shown in the middle of the circular plot (as if they all had HF = 0.5).

Variant labels

We may want to show labels with variants position and nucleotide change, to better identify variants on the plot. This can be accomplished by setting the show_var_labels option to TRUE:

plot_vcf("sample_vcf.vcf", show_var_labels = TRUE)

Multiple samples

If working with a multi-sample VCF file, plot_vcf() will automatically recognise that and create a separate plot for each sample:

plot_vcf("sample_multi.vcf")

Plot details

By default, plot_vcf() will show the locus name above each mitochondrial locus, as well as a legend explaining colours used for different locus types (protein-coding, tRNA, rRNA, regulatory regions). Both these options can be disabled by setting respectively show_loci_names and show_loci_legend to FALSE:

plot_vcf("sample_vcf.vcf", show_loci_names = FALSE)
plot_vcf("sample_vcf.vcf", show_loci_legend = FALSE)

It is also possible to add a specific title to the resulting plot, using the title option, which accepts a string as its argument:

plot_vcf("sample_vcf.vcf", title = "My mitochondrial variants")

Saving plots

Plots generated with plot_vcf() can be saved to a PNG file using the save_plot = TRUE option, and a custom output location can be specified using the save_to option, which accepts a string with the path or filename where the plot will be saved. By default, it will be saved in the current working directory, using the filename mitoviz_plot.png.

# plot will be saved in ./mitoviz_plot.png
plot_vcf("sample_vcf.vcf", save_plot = TRUE)
# plot will be saved in ../my_dir/my_plot.png
plot_vcf("sample_vcf.vcf", save_plot = TRUE, save_to = "../my_dir/my_plot.png")

Plot variants from a JSON file

TBD



robertopreste/mitovizR documentation built on May 22, 2019, 2:46 p.m.