initializeState: Initialize the model state

Description Usage Arguments Value See Also

View source: R/state.R

Description

Initialize the model state

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
initializeState(
  sc,
  data,
  attributeSpecs,
  recIdColname,
  partitioner,
  populationSize,
  fileIdColname = NULL,
  randomSeed = 1L,
  maxClusterSize = 10L
)

Arguments

sc

A spark_connection

data

A Spark DataFrame or an R object that can be cast to a Spark DataFrame

attributeSpecs

A named list of Attribute objects. Each entry in the list specifies the model parameters for an entity attribute, and should match one of the column names (attributes) in data.

recIdColname

Column name in data that contains unique record identifiers.

partitioner

A Partitioner object which specifies how to partition the space of entities (optional). If NULL, the entities are not partitioned at all, however this can severely hinder scalability.

populationSize

An integer specifying the size of the latent entity population (optional). If NULL, the population size is set equal to the number of records in data.

fileIdColname

Column name in data that contains contains file/source identifiers for the records. If NULL, the records are assumed to be from a single file/source.

randomSeed

An integer random seed.

maxClusterSize

A guess at the maximum cluster size in data.

Value

A state_jobj object

See Also

loadState


cleanzr/dblinkR documentation built on June 13, 2021, 4:17 a.m.