Description Usage Arguments Value Author(s) References Examples
topicSCORE
uses an SVD approach to perform an optimal topic estimation.
1 | topicSCORE(D,K)
|
D |
the p by n text corpus matrix, where p is the number of common words and n is the number of documents. The (i,j) entry corresponds to the observed fraction of word i in document j. |
K |
an integer indicating the number of topics |
It returns an object of class tSCORE
, which is a list containing the following components:
topic |
the estimate of the p by K topic matrix, whose columns correspond to the expected frequencies of words in a document that discuss a certain topic. In particular, the (i,j) entry is the expected frequency of word i in a document that discuss topic j. |
weight |
the p by K weight matrix Pi, whose rows correspond to the weight vector of a certain word. |
R |
the n by K-1 matrix whose columns represent the ratio of singular vectors of the text corpus matrix. |
vertice |
the matrix whose columns are the estimated vertices in the vertice hunting step |
Tracy Ke, Lijia Zhou and Qi Zhu.
Maintainer: Lijia Zhou <zlj@uchicago.edu>, Qi Zhu <qizhu@uchicago.edu>.
Ke and Wang (2017) "A New SVD Approach To Optimal Topic Estimation".
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # AP (Harman, 1993) consists of 2246 news articles with a vocabulary of
# 10473 words. After preprocessing, approximately 8000 words are kept.
# See reference for more details.
data(AP) # the AP file is stored in .mat format
obj = readMat(AP)
D = obj$D # text corpus matrix
vocab = obj$volc # list of vocabulary
fit = topicSCORE(D,3)
summary(fit)
fit$topic
fit$weight
word1 = fit$topic[,1]
idx = c()
w = tail(sort(word1),15)
for(i in 15:1){
idx = c(idx,which(word1 == w[i]))
}
vocab[idx]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.