spark.prefixSpan: PrefixSpan

Description Usage Arguments Value Note Examples

Description

A parallel PrefixSpan algorithm to mine frequent sequential patterns. spark.findFrequentSequentialPatterns returns a complete set of frequent sequential patterns. For more details, see PrefixSpan.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
spark.findFrequentSequentialPatterns(data, ...)

## S4 method for signature 'SparkDataFrame'
spark.findFrequentSequentialPatterns(
  data,
  minSupport = 0.1,
  maxPatternLength = 10L,
  maxLocalProjDBSize = 32000000L,
  sequenceCol = "sequence"
)

Arguments

data

A SparkDataFrame.

...

additional argument(s) passed to the method.

minSupport

Minimal support level.

maxPatternLength

Maximal pattern length.

maxLocalProjDBSize

Maximum number of items (including delimiters used in the internal storage format) allowed in a projected database before local processing.

sequenceCol

name of the sequence column in dataset.

Value

A complete set of frequent sequential patterns in the input sequences of itemsets. The returned SparkDataFrame contains columns of sequence and corresponding frequency. The schema of it will be: sequence: ArrayType(ArrayType(T)), freq: integer where T is the item type

Note

spark.findFrequentSequentialPatterns(SparkDataFrame) since 3.0.0

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
                           list(list(list(1L), list(3L, 2L), list(1L, 2L))),
                           list(list(list(1L, 2L), list(5L))),
                           list(list(list(6L)))),
                      schema = c("sequence"))
frequency <- spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L,
                                                  maxLocalProjDBSize = 32000000L)
showDF(frequency)

## End(Not run)

SparkR documentation built on June 3, 2021, 5:05 p.m.