index.md
In bigPint: Big multivariate data plotted interactively

bigPint: Make BIG data pint-sized

BIG multivariate data Plotted INTeractively.

Welcome to the bigPint package website! For users who would like to immediately try out the package in a hands-on fashion, we recommend consulting our example pipeline. This pipeline uses reproducible code and sample data that comes with the bigPint package, so you can smoothly follow along each line of example code. Please note that bigPint allows users to input either data and dataMetrics objects or SummarizedExperiment objects to produce graphics. In each article (including the pipeline), example code is first provided for data and dataMetrics object formats, followed by the same example code in SummarizedExperiment object format.

Whether or not you already tried the example pipeline above, you can become more familiar with all aspects of the bigPint package by reading from the Get Started tab at the top of this website. It contains nine short vignette articles that we recommend reading in order. These short articles consist of reproducible code and sample data that provide:

An introduction to bigPint plots and how to interpret them
A guide to installing the bigPint package
Input formats for bigPint plots (i.e. data and dataMetrics or SummarizedExperiment)
How to produce static bigPint plots
How to produce interactive bigPint plots
How to perform hierarchical clustering and use the clusters in bigPint functions
The aforementioned Quick Start recommended RNA-seq visualization pipeline

The bigPint software aims to "Make BIG data pint-sized". You can easily create modern and effective plots for your large multivariate datasets. These plots allow you to quickly examine the variability between all samples in your dataset, assess the variability between treatment groups versus between replicate groups, check for normalization issues, and discover outliers in your dataset. They also allow you to superimpose a subset of observations onto your full dataset to better understand how data subsets relates to your whole dataset. Both static and interactive plots are available.

The bigPint package can be useful for examining any large multivariate dataset. However, we note that the example datasets and example code in this package consider RNA-sequencing datasets. If you are using this software for RNA-sequencing data, then it can help you confirm that the variability between your treatment groups is larger than that between your replicates and determine how various normalization techniques in popular RNA-sequencing analysis packages (such as edgeR, DESeq2, and limma) affect your dataset. Moreover, you can easily superimpose lists of differentially expressed genes (DEGs) onto your dataset to check that they show the expected patterns (large variability between treatment groups and small variability between replicates).

Large multivariate datasets are common across numerous disciplinary fields. The best approach for looking at quantitative multivariate data are scatterplot matrices; parallel coordinate plots; and replicate line plots. Each of these plots enable assessing the association between multiple variables. With effective plotting tools, analysts can improve modeling; they can iterate between visualizations and modeling to enhance the models based on feedback from the visuals.

However, these plots are ineffective with large quantities of data: Overplotting can obscure important structure, and the plots can be slow to render if every observation is mapped to a graphical element. In this package, we developed more useful visualization techniques for large multivariate datasets by incorporating appropriate summaries and using interactivity.