README.md

viztools

This package provides tools for visualization of results from top-down proteomics studies of prefractionated biological samples and is based on novel visualizations developed for evaluation of the PEPPI-MS prefractionation method. Also suitable for visualization of samples fractionated using GELFrEE or comparison of biological or technical replicates.

A Shiny web application for these tools is also available. See the Shiny app section below for more information.

UpSet plots

A novel method for visualization of intersecting sets developed by Lex, Gehlenborg, et al. and implemented using the excellent UpSetR package. Provides improved readability in comparison to Euler and Venn diagrams, especially for visualization of large numbers of sets. The PEPPI-MS paper introduced the use of UpSet plots to show the occurrences and intersections of proteoform identifications across multiple molecular weight-based fractions.

Intersection Degree plots

Useful for showing the intersection degrees of proteoform identifications, i.e. the percentage of identifications occurring in one fraction, two fractions, etc.

Molecular weight heatmaps

Used to visualize the distribution of proteoform identifications by molecular weight. Can be made in a vertical orientation for comparison to SDS-PAGE gels:

Waffle plots

Used for visualizing quantity and subcellular localization of proteoform identifications by fraction.

Installation

Install from GitHub:

remotes::install_github("davidsbutcher/viztools")

Usage

Formatting input spreadsheet files

Input files for make_UpSet_plot() and make_intersection_degree_plot() should have column names corresponding to fraction/replicate designations and row values corresponding to unique protein/proteoform identifiers, e.g. UniProt accession numbers or CTDP proteoform record numbers.

An input file for make_heatmap() should have a column providing molecular weights and a column providing the fraction/replicate number. Default column names are “mass” and “fraction” but can be specified in the function arguments.

An input file for waffle_iron() should have a column providing the fraction/replicate number and columns providing subcellular localization counts. Column names other than “fraction” are used for legend labels, so I recommend naming them “Cytosol”, “Membrane”, etc.

Example input files for each visualization type can be found in the extdata folder in the package directory.

Loading and visualizing data

Load an input spreadsheet file as an R object using an appropriate function, e.g. readxl::read_xlsx() for XLSX files or readr::read_csv() for CSV files. Then, pass the object to the appropriate visualization function:

# Read an XLSX

df <- 
   readxl::read_xlsx(
      "C:\Users\YourName\Documents\protein_data.xlsx"
   )

# Read a CSV

df <- 
   readr::read_csv(
      "C:\Users\YourName\Documents\protein_data.csv"
   )

# Use data frame as argument for a visualization function

make_UpSet_plot(df)

Saving plots

Plots created using viztools can be saved by setting the argument savePDF = TRUE:

make_UpSet_plot(df, savePDF = TRUE)

With the exception of UpSet plots, they can also be saved using the ggplot2::ggsave() function:

make_heatmap(df)

ggplot2::ggsave(
   "heatmap.png",
   dpi = 300,
   height = 5,
   width = 8
)

Shiny

A GUI web application is currently hosted at shinyapps.io. Input spreadsheet files should be formatted as specified above.

Dependencies

viztools utilizes the package UpSetR for generating UpSet plots and waffle for generating Waffle plots. Other visualizations are generated using ggplot2. Additional functions are imported from dplyr, tibble, purrr, glue, tidyr, magrittr, assertthat, and scales.

License and attribution

Package developed by David S. Butcher and licensed under CC BY 4.0. Imported packages are licensed separately.



davidsbutcher/viztools documentation built on Oct. 5, 2020, 3:25 a.m.