intruderWords: Function to validate the fit of the LDA model

Description Usage Arguments Value References Examples

View source: R/intruderWords.R

Description

This function validates a LDA result by presenting a mix of words from a topic and intruder words to a human user, who has to identity them.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
intruderWords(
  beta = NULL,
  byScore = TRUE,
  numTopwords = 30L,
  numIntruder = 1L,
  numOutwords = 5L,
  noTopic = TRUE,
  printSolution = FALSE,
  oldResult = NULL,
  test = FALSE,
  testinput = NULL
)

Arguments

beta

A matrix of word-probabilities or frequency table for the topics (e.g. the topics matrix from the LDAgen result). Each row is a topic, each column a word. The rows will be divided by the row sums, if they are not 1.

byScore

Logical: Should the score of top.topic.words from the lda package be used?

numTopwords

The number of topwords to be used for the intruder words

numIntruder

Intended number of intruder words. If numIntruder is a integer vector, the number would be sampled for each topic.

numOutwords

Integer: Number of words per topic, including the intruder words.

noTopic

Logical: Is x input allowed to mark nonsense topics?

printSolution

tba

oldResult

Result object from an unfinished run of intruderWords. If oldResult is used, all other parameter will be ignored.

test

Logical: Enables test mode

testinput

Input for function tests

Value

Object of class IntruderWords. List of 7

result

Matrix of 3 columns. Each row represents one topic. All values are 0 if the topic did not run before. numIntruder (1. column) gives the number of intruder words inputated in this topic, missIntruder (2. column) the number of the intruder words which were not found by the coder and falseIntruder (3. column) the number of the words choosen by the coder which were no intruder.

beta

Parameter of the function call

byScore

Parameter of the function call

numTopwords

Parameter of the function call

numIntruder

Parameter of the function call

numOutwords

Parameter of the function call

noTopic

Parameter of the function call

References

Chang, Jonathan and Sean Gerrish and Wang, Chong and Jordan L. Boyd-graber and David M. Blei. Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems, 2009.

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
words10 <- makeWordlist(text=poliClean$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliClean$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=10, vocab=words10)
intruder <- intruderWords(beta=LDAresult$topics)
## End(Not run)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.