Description Details Author(s) See Also Examples
This package provides a number of utility functions for manipulating R's native histogram objects. The functions are focused on operations that are particularly useful when dealing with large numbers of histograms with identical buckets, such as those produced from distributed MapReduce computations. This package also provides a ‘HistogramTools.HistogramState’ protocol buffer representation of the default R histogram class to allow histograms to be very concisely serialized and shared with other systems.
See library(help=HistogramTools)
for version number, dates,
dependencies, and a complete list of functions.
Index (possibly out of date):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | AddHistograms Aggregate histogram objects that have identical breaks.
MergeBuckets Merge adjacent buckets of a histogram.
ApproxQuantile Approximate the quantiles of the underlying distribution.
ApproxMean Approximate the mean of the underlying distribution.
Count Count of all samples in a histogram.
HistToEcdf Approximate the ECDF of the underlying distribution.
SubsetHistogram Subset a histogram by removing some of the buckets.
TrimHistogram Remove empty buckets from the tails of a histogram.
ScaleHistogram Scale histogram bucket counts by a numeric value.
PreBinnedHistogram Generate a histogram from pre-binned data.
AshFromHist Compute Average Shifted Histogram from a histogram.
KSDCC Compute maximal KS-statistic of CDFs constructed from histogram.
EMDCC Compute maximal Earth Mover's Distance of CDFs constructed from histogram.
PlotKSDCC Plot the KSDCC metric and a CDF from the histogram.
PlotEMDCC Plot the EMDCC metric and a CDF from the histogram.
PlotLog2ByteEcdf Plot the CDF from a histogram with log2 scaled byte boundaries.
PlotLogTimeDurationEcdf Plot the CDF from a histogram with log scaled time duration boundaries.
PlotRelativeFrequency Plot a relative frequency histogram.
ReadHistogramsFromDtraceOutputFile Read a list of Histograms from the output of the DTrace tool.
minkowski.dist Compute the Minkowski difference between two histograms.
intersect.dist Compute the histogram intersection distance between two histograms.
kl.divergence Compute the Kullback-Leibler divergence between two histograms.
jeffrey.divergence Compute the Jeffrey divergence between two histograms.
|
Murray Stokely <mstokely@google.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | if(require(RProtoBuf)) {
library(HistogramTools)
tmp.hist <- hist(c(1,2,4,43,20,33,1,1,3), plot=FALSE)
# The default R serialization takes a fair number of bytes
length(serialize(tmp.hist, NULL))
# Convert to a protocol buffer representation.
hist.msg <- as.Message(tmp.hist)
# Which has an ASCII representation like this:
cat(as.character(hist.msg))
# Or can be serialized and shared with other tools much more
# succinctly than R's built-in serialization format.
length(hist.msg$serialize(NULL))
# And since this isn't even compressed, we can reduce it further
# with in-memory compression:
length(memCompress(hist.msg$serialize(NULL)))
# If we read in the raw.bytes from another tool
raw.bytes <- hist.msg$serialize(NULL)
# We can parse the raw bytes as a protocol buffer
new.hist.proto <- P("HistogramTools.HistogramState")$read(raw.bytes)
new.hist.proto
# Then convert back to a native R histogram.
new.hist <- as.histogram(new.hist.proto)
# The new histogram and the old are identical except for xname
}
|
Loading required package: RProtoBuf
breaks: 0
breaks: 10
breaks: 20
breaks: 30
breaks: 40
breaks: 50
counts: 6
counts: 1
counts: 0
counts: 1
counts: 1
name: "c(1, 2, 4, 43, 20, 33, 1, 1, 3)"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.