Description Usage Arguments Details Value Author(s) References
These GLAs compute various univariate statistics separately for each input.
1 2 3 4 5 6 7 8 9 10 |
data |
A |
outputs |
The usual way to specify the outputs. If both this and names
for the |
number.bins |
The number of bins to use in the binning algorithm. |
sort.threshold |
The maximum number of items on which to manually sort. |
input |
A named list of expressions, with the names being used as the corresponding outputs. These expressions are outputted in addition to those used to specify the extremities. If no name is given and the corresponding expression is simply an attribute, then said attribute is used as the name. Otherwise an error is thrown, as there is no reason to include an extra input if corresponding output column cannot be referenced later. |
The result of each GLA is a waypoint with one column per input and a single row whose value is the specified univariate statistic for the corresponding expression.
With the exception of finding the median, all of these aggregates are fairly straightforward, require O(k) space, and run in O(n \cdot k) time, where k is the number of inputs and n is the number of tuples.
The median algorithm relies on a iterative binning algorithm, based on the Tibshirani paper. This algorithm requires two parameters: the number of bins to use (b) and the threshold at which to sort (t). During the first iteration, the range of the input is found. This interval is then split into b equal parts. Each input is then sorted into bins and the bin that must contain the median is then sub-divided into b equal parts. This recursive sub-division continues until less than t elements are in a bin that contains the median. These elements are then sorted and the median is outputted. As such, this algorithm requires O(k \cdot b) spaces and runs in O(k \cdot (n \cdot \log_b n + t \log t)) time.
A waypoint
with a single row. See ‘details’ for
more information.
Jon Claus, <jonterainsights@gmail.com>, Tera Insights, LLC.
hrefhttp://www.stat.cmu.edu/~ryantibs/papers/median.pdfTibshirani for details regarding the binning algorithm.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.