topdocs: Find representative documents for each topic

Description Usage Arguments Examples

Description

Find representative documents for each topic

Usage

1
topdocs(theta, docs, n = 30)

Arguments

theta

Matrix of document-topic probabilities. Could be taken from the output of getProbs

docs

A character vetor where each element is a document from the corpus. The length should be equal to the first dimension of theta.

n

The number of documents to return within each topic.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
 data(APinput)
 data(APtopics)
 data(APcorpus)
 probs <- getProbs(APinput$word.id, APinput$doc.id, APtopics$topic, APinput$vocab, sort.topics="byTerms")
 top.docs <- topdocs(probs$theta.hat, APcorpus[APinput$category == 0], n=5)
 #write the file for uploading to shiny...
 write.table(top.docs, file=paste0(getwd(), "/top5docs.txt"), sep="\t", row.names=FALSE)
 #save(top.docs, file="~/LDAtool/data")
 #sanity check (peaks on plot below should line up on same topic)
 corpus <- APcorpus[APinput$category == 0]
 idx <- which(corpus %in% top.docs[,"Topic1"])
 plot(probs$theta.hat[idx[1],], type="l")
 for (i in 2:length(idx)) {
   lines(probs$theta.hat[idx[i],], col=i)
 }

## End(Not run)

kshirley/LDAtools documentation built on May 20, 2019, 7:03 p.m.