An R package implementing the first algorithm described by Karnin, Lang and Liberty in Optimal Quantile Approximation in Streams.
Efficiently computes (an approximation of) the CDF of numeric values stored in a vector or in a DelayedArray.
library(DelayedArray)
library(kll)
d <- DelayedArray(array(runif(1000000, dim = c(1000000, 1))))
approx_cdf(d, 20L)
The library handles blocking transparently. For instance, the code below will process the array in chunks of 100 rows each while producing the same final result as above.
setAutoGridMaker(function(x) rowGrid(x, nrow = 100))
approx_cdf(d, 20L)
It is also possible to obtain column-level CDFs:
d <- DelayedArray(array(runif(1000), dim = c(500, 2)))
approx_col_cdf(d, 20L)
The package is still under active development. It should be considered experimental.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.