topicSCORE: Optimal Topic Estimation
In zhoulijia/SCORE: Spectral Clustering On Ratios-of-Eigenvectors

Description Usage Arguments Value Author(s) References Examples

View source: R/main.R

topicSCORE uses an SVD approach to perform an optimal topic estimation.

1	topicSCORE(D,K)

`D`	the p by n text corpus matrix, where p is the number of common words and n is the number of documents. The (i,j) entry corresponds to the observed fraction of word i in document j.
`K`	an integer indicating the number of topics

It returns an object of class tSCORE, which is a list containing the following components:

`topic`	the estimate of the p by K topic matrix, whose columns correspond to the expected frequencies of words in a document that discuss a certain topic. In particular, the (i,j) entry is the expected frequency of word i in a document that discuss topic j.
`weight`	the p by K weight matrix Pi, whose rows correspond to the weight vector of a certain word.
`R`	the n by K-1 matrix whose columns represent the ratio of singular vectors of the text corpus matrix.
`vertice`	the matrix whose columns are the estimated vertices in the vertice hunting step

Tracy Ke, Lijia Zhou and Qi Zhu.

Maintainer: Lijia Zhou <zlj@uchicago.edu>, Qi Zhu <qizhu@uchicago.edu>.

Ke and Wang (2017) "A New SVD Approach To Optimal Topic Estimation".

# AP (Harman, 1993) consists of 2246 news articles with a vocabulary of 
# 10473 words. After preprocessing, approximately 8000 words are kept.
# See reference for more details.

data(AP)           # the AP file is stored in .mat format
obj = readMat(AP)
D = obj$D          # text corpus matrix
vocab = obj$volc   # list of vocabulary

fit = topicSCORE(D,3)
summary(fit)

fit$topic
fit$weight

word1 = fit$topic[,1]
idx = c()
w = tail(sort(word1),15)
for(i in 15:1){
  idx = c(idx,which(word1 == w[i]))
}

vocab[idx]