getProbs: Compute topic-word and document-topic probability...
In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

Description Usage Arguments Value Examples

This function assumes the ordering of word.id, doc.id, topic.id matters! That is, the first element of word.id corresponds to the first element of doc.id which corresponds to the first element of topic.id. Similarly, the second element of tokens corresponds to the second element of doc.id which corresponds to the second element of topic.id (and so on). Also, the ordering of the elements of vocab are assumed to correspond to the elements of word.id, so that the first element of vocab is the token with word.id equal to 1, the second element of vocab is the token with word.id equal to 2, etc.

1
2
3

getProbs(word.id = numeric(), doc.id = numeric(), topic.id = numeric(),
  vocab = character(), alpha = 0.01, beta = 0.01,
  sort.topics = c("None", "byDocs", "byTerms"), K = integer())

`word.id`	a numeric vector with the token id of each token occurrence in the data.
`doc.id`	a numeric vector containing the document id number of each token occurrence in the data.
`topic.id`	a numeric vector with a unique value for each topic.
`vocab`	a character vector of the unique words included in the corpus. The length of this vector should match the max value of `word.id`.
`alpha`	Dirichlet hyperparameter. See fitLDA.
`beta`	Dirichlet hyperparameter. See fitLDA.
`sort.topics`	Sorting criterion for topics. Supported methods include: "byDocs" to sort topics by the number of documents for which they are the most probable or "byTerms" to sort topics by the number of terms within topic.

A list of two matrices and one vector. The first matrix is, phi.hat, contains the distribution over tokens for each topic, where the rows correspond to topics. The second matrix, theta.hat, contains the distribution over topics for each document, where the rows correspond to documents. The vector returned by the function, topic.id, is the vector of sampled topics from the LDA fit, with topic indices re-labeled in decreasing order of frequency by the sort.topics argument.

data(APinput)
#takes a while
## Not run: o <- fitLDA(APinput$word.id, APinput$doc.id)
data(APtopics) #load output instead for demonstration purposes
probs <- getProbs(word.id=APinput$word.id, doc.id=APinput$doc.id, topic.id=APtopics$topics,
                   vocab=APinput$vocab)
head(probs$phi.hat[,1:5])
head(probs$theta.hat)

kshirley/LDAtools documentation built on May 20, 2019, 7:03 p.m.

kshirley/LDAtools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kshirley/LDAtools
Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

getProbs: Compute topic-word and document-topic probability...
In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

Description

Usage

Arguments

Value

Examples

Related to getProbs in kshirley/LDAtools...

R Package Documentation

Browse R Packages

We want your feedback!

kshirley/LDAtools Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

getProbs: Compute topic-word and document-topic probability... In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

Description

Usage

Arguments

Value

Examples

Related to getProbs in kshirley/LDAtools...

R Package Documentation

Browse R Packages

We want your feedback!

kshirley/LDAtools
Tools to fit a topic model using Latent Dirichlet Allocation (LDA)

getProbs: Compute topic-word and document-topic probability...
In kshirley/LDAtools: Tools to fit a topic model using Latent Dirichlet Allocation (LDA)