powerIterationClustering: PowerIterationClustering

Description Usage Arguments Value Note Examples

Description

A scalable graph clustering algorithm. Users can call ml_assign_clusters to return a cluster assignment for each input vertex. Run the PIC algorithm and returns a cluster assignment for each input vertex.

Usage

1
2
3
4
5
6
7
8
9
ml_assign_clusters(
  data,
  k = 2L,
  initMode = c("random", "degree"),
  maxIter = 20L,
  sourceCol = "src",
  destinationCol = "dst",
  weightCol = NULL
)

Arguments

data

a spark_tbl.

k

the number of clusters to create.

initMode

the initialization algorithm; "random" or "degree"

maxIter

the maximum number of iterations.

sourceCol

the name of the input column for source vertex IDs.

destinationCol

the name of the input column for destination vertex IDs

weightCol

weight column name. If this is not set or NULL, we treat all instance weights as 1.0.

...

additional argument(s) passed to the method.

Value

A dataset that contains columns of vertex id and the corresponding cluster for the id. The schema of it will be: id: integer, cluster: integer

Note

ml_assign_clusters(spark_tbl) since 3.0.0

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
df <- spark_tbl(
  tribble(~src, ~dst, ~weight,
          0L, 1L, 1.0,
          0L, 2L, 1.0,
          1L, 2L, 1.0,
          3L, 4L, 1.0,
          4L, 0L, 0.1))
clusters <- ml_assign_clusters(df, initMode = "degree", weightCol = "weight")
show(clusters)

## End(Not run)

danzafar/tidyspark documentation built on Sept. 30, 2020, 12:19 p.m.