segment: Segmenter

Description Usage Arguments Details Value AUTO Author(s) Examples

Description

Given a set of grouping attributes, the data is aggregated for each unique combination of grouping attributes.

Usage

1
GroupBy(data, segment = AUTO, passes = 1, num.segments = 64)

Arguments

data

an object of class "GLA".

segment

an expression to produce the segments on.

passes

a list of sub-ggregates. See ‘details’ for more information.

num.segments

the number of segments to split the input GLA into. This should be at least the number of real CPUs times the number of passes, preferably twice that amount.

Details

This GLA can only be placed on top of other GLAs, not arbitrary waypoints like most GLAs, and alters the way in which the input GLA is computed. Rather than performing the input GLA on all of the data at once, the data is split into num.segments separate pieces and the input GLA processes each segment separately, combining them afterwards. For GLAs who use O(n) space or worse, where n is the number of input tuples, this can serve to reduce the memory needed by a factor up to num.segments * passes at the cost of performance.

Value

The result of the input GLA.

AUTO

If segment = AUTO, the first input expression of the input GLA is used. If the input GLA contains no inputs, such as Count, an error is thrown. This should be used with caution, as typically segment should be resolved as a single attribute.

Author(s)

Jon Claus, <jonterainsights@gmail.com>, Tera Insights LLC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Adapted from TPCH Query 1
library(gtBase)

data <- Read(lineitem10g)

agg <- GroupBy(data, group = c(l_returnflag, l_linestatus),
               sum_disc_price = Sum(l_extendedprice * (1 - l_discount)))

segmented <- Segmenter(agg)

View(segmented)

tera-insights/gtBase documentation built on May 31, 2019, 8:35 a.m.