README.md
In hadley/bigvis: Tools for visualisation of big data sets

bigvis

The bigvis package provides tools for exploratory data analysis of large datasets (10-100 million obs). The aim is to have most operations take less than 5 seconds on commodity hardware, even for 100,000,000 data points.

Since bigvis is not currently available on CRAN, the easiest way to try it out is to:

# install.packages("devtools")
devtools::install_github("hadley/bigvis")

The bigvis package is structured around the following workflow:

bin() and condense() to get a compact summary of the data
if the estimates are rough, you might want to smooth(). See best_h() and rmse_cvs() to figure out a good starting bandwidth
if you're working with counts, you might want to standardise()
visualise the results with autoplot() (you'll need to load ggplot2 to use this)

Bigvis also provides a number of standard statistics efficiently implemented on weighted/binned data: weighted.median, weighted.IQR, weighted.var, weighted.sd, weighted.ecdf and weighted.quantile.

This package wouldn't be possible without:

the fantastic Rcpp package, which makes it amazingly easy to integrate R and C++
JJ Allaire and Carlos Scheidegger who have indefatigably answered my many C++ questions
the generous support of Revolution Analytics who supported the early development.
Yue Hu, who implemented a proof of concepts that showed that it might be possible to work with this much data in R.