madlib.lda: Wrapper for MADlib's Latent Dirichilet Allocation
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description Usage Arguments Value Author(s) References See Also Examples

This function is a wrapper for MADlib's Latent Dirichlet Allocation. The computation is parallelized by MADlib if the connected database is distributed. Please refer to MADlib documentation for details of the algorithm implementation [1].

1 2	madlib.lda(data, topic_num, alpha, beta, iter_num = 20, nstart = 1, best = TRUE,...)

`data`	An object of `db.obj` class. This is the database table containing the documents on which the algorithm will train. The text of each document should be tokenized into 'words'.
`topic_num`	Number of topics.
`alpha`	Dirichlet parameter for the per-doc topic multinomial.
`beta`	Dirichlet parameter for the per-topic word multinomial.
`iter_num`	Number of iterations.
`nstart`	Number of repeated random starts.
`best`	If TRUE only the model with the minimum perplexity is returned.
`...`	Other optional parameters. Not implemented.

An lda.madlib object or a list of them, which is a list that contains the following items:

`assignments`	The per-document topic assignments.
`document_sums`	The per-document topic counts.
`model_table`	The `db.table` object for accessing the model table in the database.
`output_table`	The `db.table` object for accessing the output table in the database.
`tf_table`	The `db.table` object for accessing the term frequency table in the database.
`topic_sums`	The per-topic sum of assignments.
`topics`	The per-word association with topics.

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

[1] Documentation of LDA in the latest MADlib release, https://madlib.apache.org/docs/latest/group__grp__lda.html

predict.lda.madlib is used for prediction-labelling test documents using a learned lda.madlib model.

perplexity.lda.madlib is used for computing the perplexity of a learned lda.madlib model.

## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

dat <- db.data.frame("__madlib_pivotalr_lda_data__", conn.id = cid,
  verbose = FALSE)

output.db <- madlib.lda(dat, 2,0.1,0.1, 50)

perplexity.db <- perplexity.lda.madlib(output.db)
print(perplexity.db)

## Run LDA multiple times and get the best one
output.db <- madlib.lda(dat, 2,0.1,0.1, 50, nstart=2)
perplexity.db <- perplexity.lda.madlib(output.db)
print(perplexity.db)

## Run LDA multiple times and keep all models
output.db <- madlib.lda(dat, 2,0.1,0.1, 50, nstart=2, best=FALSE)

perplexity.db <- perplexity.lda.madlib(output.db[[1]])
print(perplexity.db)

perplexity.db <- perplexity.lda.madlib(output.db[[2]])
print(perplexity.db)

db.disconnect(cid)

## End(Not run)

PivotalR documentation built on March 13, 2021, 1:06 a.m.

PivotalR index

README.md An Introduction to PivotalR

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PivotalR
A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

madlib.lda: Wrapper for MADlib's Latent Dirichilet Allocation
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to madlib.lda in PivotalR...

R Package Documentation

Browse R Packages

We want your feedback!

PivotalR A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

madlib.lda: Wrapper for MADlib's Latent Dirichilet Allocation In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Related to madlib.lda in PivotalR...

R Package Documentation

Browse R Packages

We want your feedback!

PivotalR
A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

madlib.lda: Wrapper for MADlib's Latent Dirichilet Allocation
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib