Description Usage Arguments Details Value AUTO Author(s) See Also Examples
Counts the number of distinct combinations for the given attributes.
1 | CountDistinct(data, inputs = AUTO, outputs = count)
|
data |
an object of class |
inputs |
which attributes of |
outputs |
the desired column name of the result. |
This GLA counts the number of distinct combinations of the given
inputs using a full hashing of the distinct combinations. As such, it
requires O(k) space, where k is the number of distinct
combinations. The run time is O(n + k), where n is the
number of rows in data
. The second term is a result of having
to merge hashes between different states. Having a large number of
distinct values leads to significant slowdown because of this; the
BloomFilter
is recommended for these queries.
An object of class "data"
exactly one row element. Upon
conversion to a data frame, it will contain a single row.
In the case of inputs = AUTO
, all attributes of the data are
used.
Jon Claus, <jonterainsights@gmail.com>, Tera Insights LLC
BloomFilter
for a similarly functioning GLA.
1 2 3 4 5 6 7 8 9 10 | ## result is equal to total number of tuples, no repitiions
data <- Read(lineitem100g)
agg <- CountDistinct(data, inputs = c(l_tax, l_quantity, l_partkey))
result <- as.data.frame(agg)
## result is equal number of possible values for l_partkey as given
## in the specifications of TPC-H
data <- Read(lineitem100g)
agg <- CountDistinct(data, inputs = l_partkey)
result <- as.data.frame(agg)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.