Wrapper for MADlib's Latent Dirichilet Allocation

Share:

Description

This function is a wrapper for MADlib's Latent Dirichlet Allocation. The computation is parallelized by MADlib if the connected database is distributed.

Usage

1
madlib.lda(data, docid, words, topic_num, alpha, beta, iter_num = 20,...)

Arguments

data

An object of db.obj class. This is the database table containing the documents on which the algorithm will train. The text of each document should be tokenized into 'words'.

docid

Text name of the column containing the id of the documents.

words

Column name of the input data table containing the vector of words/tokens in the documents.

topic_num

Number of topics.

alpha

Dirichlet parameter for the per-doc topic multinomial.

beta

Dirichlet parameter for the per-topic word multinomial.

iter_num

Number of iterations.

...

Other optional parameters. Not implemented.

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.