Description Usage Arguments Value Examples
This function assumes the ordering of word.id
,
doc.id
, topic.id
matters! That is, the first
element of word.id
corresponds to the first element
of doc.id
which corresponds to the first element of
topic.id
. Similarly, the second element of tokens
corresponds to the second element of doc.id
which
corresponds to the second element of topic.id
(and
so on). Also, the ordering of the elements of vocab
are assumed to correspond to the elements of
word.id
, so that the first element of vocab
is the token with word.id
equal to 1, the second
element of vocab
is the token with word.id
equal to 2, etc.
1 2 3 |
word.id |
a numeric vector with the token id of each token occurrence in the data. |
doc.id |
a numeric vector containing the document id number of each token occurrence in the data. |
topic.id |
a numeric vector with a unique value for each topic. |
vocab |
a character vector of the unique words
included in the corpus. The length of this vector should
match the max value of |
alpha |
Dirichlet hyperparameter. See fitLDA. |
beta |
Dirichlet hyperparameter. See fitLDA. |
sort.topics |
Sorting criterion for topics. Supported methods include: "byDocs" to sort topics by the number of documents for which they are the most probable or "byTerms" to sort topics by the number of terms within topic. |
A list of two matrices and one vector. The first matrix is,
phi.hat
, contains the distribution over tokens for
each topic, where the rows correspond to topics. The second
matrix, theta.hat
, contains the distribution over
topics for each document, where the rows correspond to
documents. The vector returned by the function,
topic.id
, is the vector of sampled topics from the
LDA fit, with topic indices re-labeled in decreasing order
of frequency by the sort.topics
argument.
1 2 3 4 5 6 7 8 | data(APinput)
#takes a while
## Not run: o <- fitLDA(APinput$word.id, APinput$doc.id)
data(APtopics) #load output instead for demonstration purposes
probs <- getProbs(word.id=APinput$word.id, doc.id=APinput$doc.id, topic.id=APtopics$topics,
vocab=APinput$vocab)
head(probs$phi.hat[,1:5])
head(probs$theta.hat)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.