groupby: Group By

Description Usage Arguments Details Value AUTO Author(s) Examples

Description

Given a set of grouping attributes, the data is aggregated for each unique combination of grouping attributes.

Usage

1
GroupBy(data, groupAtts, ...)

Arguments

data

an object of class "data".

group

a set of expressions to be used for grouping.

...

a list of sub-aggregates. See ‘details’ for more information.

Details

Each sub-aggregate provided should resemble a typical call to an aggregate, except for the following:

  1. data should not provided as an argument to any of these inner calls. It has already been specified by the argument to GroupBy.

  2. AUTO is no-where supported, regardless of whether it is supported for that sub-aggregate outside of GroupBy

Each sub-aggregate should be provided a list of inputs and outputs of appropriate length. These inputs and outputs are then combined and used for the overall aggregate, with each output appearing in the final result.

group can either be a list of expressions encapsulated by a call to c() or a single expression. In the former case, names can be provided to the expressions, such as c(name1 = expr1).

Each grouping expression is included in the result. If a name is included in the argument list within c(), then the corresponding column is given that name. Otherwise, if the grouping expression is merely an attribute of the data, the column is named after that attribute. If not, then a name is generated for the column, which is hidden from the user and guaranteed not to create naming conflicts with other columns.

Value

An object of class "data". It will have a column for each grouping expression and output specified by a sub-aggregate. For each unique combination of group attributes witnessed by the data, the given sub-aggregates will be calculated using only tuples whose attributes match that unique combination.

In the case that the sub-aggregates produce results with varying number of rows for some group, then the results with less rows will be repeated sufficiently to match the number of rows of the longest result, which is similarly done by various R processes.

For example, if an OrderBy and a Sum are performed in which the OrderBy produces four rows for some group, then the value produced by Sum will be repeated four times for that group.

AUTO

AUTO is supported neither for grouping attributes nor for the inner GLAs, regardless of whether or not it is supported for the associated functions outside of GroupBy.

Author(s)

Jon Claus, <jonterainsights@gmail.com>, Tera Insights LLC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## TPCH Query 1
data <- Read(lineitem10g)
filter <- data[l_shipdate <= .(as.Date("1998-12-01")) - 90]
agg <- GroupBy(
  filter,
  groupAtts = c(rf = l_returnflag, ls = l_linestatus),
  sum_disc_price = Sum(l_extendedprice * (1 - l_discount)),
  sum_charge = Sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)),
  avg_qty = Average(l_quantity),
  count_order = Count(1),
  sum_qty = Sum(l_quantity),
  avg_price = Average(l_extendedprice),
  sum_base_price = Sum(l_extendedprice),
  avg_disc = Average(l_discount)
)
agg <- OrderBy(
  agg,
  asc(rf),
  dsc(ls),
  rank = rank
)
result <- as.data.frame(agg)

tera-insights/gtBase documentation built on May 31, 2019, 8:35 a.m.