README.md

EpiCircos

An implementation of Circos plots for epidemiologists

Full documentation is available on the EpiCircos website.

Citation

If you use the package please use this citation. Please also cite Circlize and if using the legend functions cite ComplexHeatmap, both citations can be found here.

Install

Install directly from GitHub using the following code:

# Install devtools
install.packages("devtools")
library(devtools)

# Install EpiCircos directly from GitHub
devtools::install_github("mattlee821/EpiCircos")
library(EpiCircos)

You may be unable to install the pakcage because of an issue installing ComplexHeatmap. An example error:

Skipping 1 packages not available: ComplexHeatmap
Installing 12 packages: circlize, ComplexHeatmap, digest, dplyr, ellipsis, GlobalOptions, pillar, Rcpp, rlang, shape, tibble, vctrs
Error: (converted from warning) package ‘ComplexHeatmap’ is not available (for R version 3.5.3)

To fix this you should install ComplexHeatmap first and then EpiCircos. Install ComplexHeatmap as follows:

# Install devtools
install.packages("devtools")
library(devtools)

# Install ComplexHeatmap directly from Bioconductor
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("ComplexHeatmap")

# Install EpiCircos directly from GitHub
devtools::install_github("mattlee821/EpiCircos")
library(EpiCircos)

Description

Epidemiologists using large complex data are limited in choice of visualisation tools. Circos plots proivde an informative visualisation tool for examining large complex data, but are traditionally used in genomics and are not easily adaptable for epidemiology. In genomics work Circos plots provide an efficeint way of visually inspecting and comparing data and results.

EpiCircos is a function that simplifies the Circlize package for use with epidemiological data. Data can be displayed in a number of ways:

How to

Circos plots can be created by callingcircos_plot(). You can plot up to three tracks using track_number =. The plot is limited to three tracks for readability.

Example data is stored in the function, you can access it in your environment with data <- EpiCircos_data. The data can be used directly in the circos_plot() function by assigning it to track1_data = EpiCircos_data. This is simulated data based on a Mendelian randomization analysis of body mass index to 123 metabolites. It has 123 rows (outcomes) and 8 columns (variables). The variables include betas, standard errors and p-values. For more info ?EpiCircos::EpiCircos_data. NOTE: The example data is the ideal situation for how your own dataframe should be formatted for use with EpiCircos.

head(EpiCircos::EpiCircos_data)
       label outcome_group outcome_subgroup effect_estimate standard_error
1 IBJCX2116O             A  Section label 1    -0.036797778   -0.012265926
2 VVUHQ0448G             B  Section label 2    -0.009396427   -0.003132142
3 XAXHR5573J             C  Section label 3    -0.128358126   -0.042786042
4 AOATO7677O             D  Section label 4    -0.122093654   -0.040697885
5 OMMTE5780R             E  Section label 5     0.095617488    0.031872496
6 VFSCU5692N             F  Section label 6     0.017775970    0.005925323
       Pvalue lower_confidence_interval upper_confidence_interval     bars
1 0.017387080              -0.012756563               -0.06083899 9.547940
2 0.006434143              -0.003257428               -0.01553543 8.661527
3 0.043749939              -0.044497484               -0.21221877 7.245723
4 0.056193359              -0.042325800               -0.20186151 9.627461
5 0.065877771               0.033147396                0.15808758 7.943525
6 0.094817010               0.006162336                0.02938960 7.968774
      lines
1  77.39050
2 112.82122
3  89.58869
4  67.29705
5  90.07912
6  88.89458

Example

The simplest Circos plot to make is with 1 track.

circos_plot(track_number = 1, # how many track do you want to plot
            track1_data = EpiCircos::EpiCircos_data, # what is the dataframe for your first track
            track1_type = "points", # how do you want to plot your first track
            label_column = 1, # whats is the column of your labels
            section_column = 2, # what is the column of your sections
            estimate_column = 4, # what is the column of your estimate (beta, OR etc.)
            pvalue_column = 5, # what is the column of your p-value
            pvalue_adjustment = 0.05, # what do you want your p-value adjustment to be (multiple testing threshold)
            lower_ci = 7, # what is the column of your lower confidence interval
            upper_ci = 8) # what is the column of your upper confidence interval

You can have multiple tracks each with differnt styles. Track styles can be: "points", "lines", "bar", "histogram".

circos_plot(track_number = 3,
            track1_data = EpiCircos::EpiCircos_data,
            track2_data = EpiCircos::EpiCircos_data,
            track3_data = EpiCircos::EpiCircos_data,
            track1_type = "points",
            track2_type = "lines",
            track3_type = "bar",
            label_column = 1,
            section_column = 2,
            estimate_column = 4,
            pvalue_column = 5,
            pvalue_adjustment = 0.05,
            lower_ci = 7,
            upper_ci = 8,
            lines_column = 10,
            lines_type = "o",
            bar_column = 9,
            histogram_column = 4,
            histogram_binsize = 0.01,
            histogram_densityplot = F)

Legend

The legend function is taken from ComplexHeatmap. It will place a legend at the bottom of the plot. The legend will be populated with: points coloured for each track and a label for each track, a point for p-value label, and section headers.

circos_plot(track_number = 3,
            track1_data = EpiCircos::EpiCircos_data,
            track2_data = EpiCircos::EpiCircos_data,
            track3_data = EpiCircos::EpiCircos_data,
            track1_type = "points",
            track2_type = "lines",
            track3_type = "bar",
            label_column = 1,
            section_column = 2,
            estimate_column = 4,
            pvalue_column = 5,
            pvalue_adjustment = 0.05,
            lower_ci = 7,
            upper_ci = 8,
            lines_column = 10,
            lines_type = "o",
            bar_column = 9,
            legend = TRUE,
            track1_label = "Track 1",
            track2_label = "Track 2",
            track3_label = "Track 3",
            pvalue_label = "<= 0.05",
            circle_size = 25)

Saving plots

For best results save your plot as PDF or SVG. Both can be converted to other image formats. The following code can be used to save as either PDF or SVG. Adjust the width and height functions to get the correct sizing for your plot and then adjust the pointsize function. The following values for each work for most plots:

pdf("my_plot.pdf",
    width = 30, height = 30, pointsize = 35)
circos_plot(...)
dev.off()

If just using the RStudio plots panel you will not be able to see the finished plot as it will appear. Similarly, saving as anything other than PDF will not give a good visualisation.

Plot saved as PNG file:

Session info

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0
##  [5] yaml_2.2.0      Rcpp_1.0.3      stringi_1.4.3   rmarkdown_2.0  
##  [9] knitr_1.26      stringr_1.4.0   xfun_0.11       digest_0.6.23  
## [13] rlang_0.4.2     evaluate_0.14


mattlee821/EpiCircos documentation built on Jan. 6, 2020, 7:13 a.m.