Distinct: Remove Duplication Combinations

Description Usage Arguments Details Value Author(s)

Description

Filters out repeated combination of inputs.

Usage

1

Arguments

inputs

A named list of expressions, with the names being used as the corresponding outputs. These expressions are outputted in addition to those used to specify the extremities.

If no name is given and the corresponding expression is simply an attribute, then said attribute is used as the name. Otherwise an error is thrown, as there is no reason to include an extra input if corresponding output column cannot be referenced later.

outputs

The usual way to specify the outputs. If both this and names for the inputs are given, an error is thrown.

Details

This GLA returns the distinct combinations of the given inputs using a full hashing of the distinct combinations. As such, it requires O(k) space, where k is the number of distinct combinations. The run time is O(n + k), where n is the number of rows in data. The second term is a result of having to merge hashes between different states. Having a large number of distinct values leads to significant slowdown because of this.

Value

A waypoint.

Author(s)

Jon Claus, <jonterainsights@gmail.com>, Tera Insights, LLC.


tera-insights/gtBase documentation built on May 31, 2019, 8:35 a.m.