freqItems: Finding frequent items for columns, possibly with false...

Description Usage Arguments Value Note See Also Examples

View source: R/stats.R

Description

Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou.

Usage

1
freqItems(x, cols, support = 0.01)

Arguments

x

A spark_tbl.

cols

A vector column names to search frequent items in.

support

(Optional) The minimum frequency for an item to be considered frequent. Should be greater than 1e-4. Default support = 0.01.

Value

a local R data.frame with the frequent items in each column

Note

freqItems since 1.6.0

See Also

Other stat functions: approxQuantile(), corr(), covariance(), crosstab(), sampleBy()

Examples

1
2
3
4
5
## Not run: 
df <- read.json("/path/to/file.json")
fi = freqItems(df, c("title", "gender"))

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.