topicSCORE: Optimal Topic Estimation

Description Usage Arguments Value Author(s) References Examples

View source: R/main.R

Description

topicSCORE uses an SVD approach to perform an optimal topic estimation.

Usage

1

Arguments

D

the p by n text corpus matrix, where p is the number of common words and n is the number of documents. The (i,j) entry corresponds to the observed fraction of word i in document j.

K

an integer indicating the number of topics

Value

It returns an object of class tSCORE, which is a list containing the following components:

topic

the estimate of the p by K topic matrix, whose columns correspond to the expected frequencies of words in a document that discuss a certain topic. In particular, the (i,j) entry is the expected frequency of word i in a document that discuss topic j.

weight

the p by K weight matrix Pi, whose rows correspond to the weight vector of a certain word.

R

the n by K-1 matrix whose columns represent the ratio of singular vectors of the text corpus matrix.

vertice

the matrix whose columns are the estimated vertices in the vertice hunting step

Author(s)

Tracy Ke, Lijia Zhou and Qi Zhu.

Maintainer: Lijia Zhou <zlj@uchicago.edu>, Qi Zhu <qizhu@uchicago.edu>.

References

Ke and Wang (2017) "A New SVD Approach To Optimal Topic Estimation".

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# AP (Harman, 1993) consists of 2246 news articles with a vocabulary of 
# 10473 words. After preprocessing, approximately 8000 words are kept.
# See reference for more details.

data(AP)           # the AP file is stored in .mat format
obj = readMat(AP)
D = obj$D          # text corpus matrix
vocab = obj$volc   # list of vocabulary

fit = topicSCORE(D,3)
summary(fit)

fit$topic
fit$weight

word1 = fit$topic[,1]
idx = c()
w = tail(sort(word1),15)
for(i in 15:1){
  idx = c(idx,which(word1 == w[i]))
}

vocab[idx]

zhoulijia/SCORE documentation built on May 18, 2019, 9:15 p.m.