freqItems: Finding frequent items for columns, possibly with false...

Description Usage Arguments Value Note See Also Examples

Description

Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in https://dl.acm.org/doi/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou.

Usage

1
2
## S4 method for signature 'SparkDataFrame,character'
freqItems(x, cols, support = 0.01)

Arguments

x

A SparkDataFrame.

cols

A vector column names to search frequent items in.

support

(Optional) The minimum frequency for an item to be considered frequent. Should be greater than 1e-4. Default support = 0.01.

Value

a local R data.frame with the frequent items in each column

Note

freqItems since 1.6.0

See Also

Other stat functions: approxQuantile(), corr(), cov(), crosstab(), sampleBy()

Examples

1
2
3
4
5
## Not run: 
df <- read.json("/path/to/file.json")
fi = freqItems(df, c("title", "gender"))

## End(Not run)

SparkR documentation built on June 3, 2021, 5:05 p.m.