Description Usage Arguments Details Value Author(s) See Also Examples
Counts the number of distinct combinations for the given expressions.
1 | CountDistinct(data, inputs, outputs = count)
|
data |
A |
inputs |
The expressions whose distinct combinations are counted. |
outputs |
The column name of the result. |
This GLA counts the number of distinct combinations of the given inputs
using a full hashing of the distinct combinations. As such, it requires
O(k) space, where k is the number of distinct combinations. The
run time is O(n + k), where n is the number of rows in
data
. The second term is a result of having to merge hashes between
different states. Having a large number of distinct values leads to
significant slowdown because of this; the BloomFilter
is
recommended for these queries.
A waypoint
containing a single row and column whose
name is given by output
.
Jon Claus, <jonterainsights@gmail.com>, Tera Insights, LLC.
BloomFilter
for a similar GLA.
BloomFilter
for a similarly functioning GLA.
1 2 3 4 5 6 7 8 9 10 | ## result is equal to total number of tuples, no repitiions
data <- Read(lineitem100g)
agg <- CountDistinct(data, inputs = c(l_tax, l_quantity, l_partkey))
result <- as.data.frame(agg)
## result is equal number of possible values for l_partkey as given
## in the specifications of TPC-H
data <- Read(lineitem100g)
agg <- CountDistinct(data, inputs = l_partkey)
result <- as.data.frame(agg)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.