topTexts: Get The IDs Of The Most Representive Texts

Description Usage Arguments Value Examples

View source: R/topTexts.R

Description

The function extracts the text IDs belonging to the texts with the highest relative or absolute number of words per topic.

Usage

1
2
3
4
5
6
7
8
9
topTexts(
  ldaresult,
  ldaID,
  limit = 20L,
  rel = TRUE,
  select = 1:nrow(ldaresult$document_sums),
  tnames,
  minlength = 30L
)

Arguments

ldaresult

LDA result

ldaID

Vector of text IDs

limit

Integer: Number of text IDs per topic.

rel

Logical: Should be the relative frequency be used?

select

Which topics should be returned?

tnames

Names of the selected topics

minlength

Minimal total number of words a text must have to be included

Value

Matrix of text IDs.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
topTexts(ldaresult=LDA, ldaID=c("A","B","C"), limit = 1L, minlength=2)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.