initializeState: Initialize the model state
In cleanzr/dblinkR: R interface for dblink

Description Usage Arguments Value See Also

Initialize the model state

initializeState(
  sc,
  data,
  attributeSpecs,
  recIdColname,
  partitioner,
  populationSize,
  fileIdColname = NULL,
  randomSeed = 1L,
  maxClusterSize = 10L
)

`sc`	A `spark_connection`
`data`	A Spark DataFrame or an R object that can be cast to a Spark DataFrame
`attributeSpecs`	A named list of `Attribute` objects. Each entry in the list specifies the model parameters for an entity attribute, and should match one of the column names (attributes) in `data`.
`recIdColname`	Column name in `data` that contains unique record identifiers.
`partitioner`	A `Partitioner` object which specifies how to partition the space of entities (optional). If NULL, the entities are not partitioned at all, however this can severely hinder scalability.
`populationSize`	An integer specifying the size of the latent entity population (optional). If NULL, the population size is set equal to the number of records in `data`.
`fileIdColname`	Column name in `data` that contains contains file/source identifiers for the records. If NULL, the records are assumed to be from a single file/source.
`randomSeed`	An integer random seed.
`maxClusterSize`	A guess at the maximum cluster size in `data`.