plotWordSub: Plotting Counts/Proportion of Words/Docs in LDA-generated...

Description Usage Arguments Value Examples

View source: R/plotWordSub.R

Description

Creates a plot of the counts/proportion of words/docs in corpora which are generated by a ldaresult. Therefore an article is allocated to a topic - and then to the topics corpus - if there are enough (see limit and alloc) allocations of words in the article to the corresponding topic. Additionally the corpora are reduced by filterWord and a search-argument. The plot shows counts of subcorpora or if rel = TRUE proportion of subcorpora to its corresponding whole corpus.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
plotWordSub(
  object,
  ldaresult,
  ldaID,
  limit = 10,
  alloc = c("multi", "unique", "best"),
  select = 1:nrow(ldaresult$document_sums),
  tnames,
  search,
  ignore.case = TRUE,
  type = c("docs", "words"),
  rel = TRUE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  main,
  xlab,
  ylab,
  ylim,
  both.lwd,
  both.lty,
  col,
  legend = "topright",
  natozero = TRUE,
  file,
  ...
)

Arguments

object

textmeta object with strictly tokenized text component (character vectors) - such as a result of cleanTexts

ldaresult

The result of a function call LDAgen

ldaID

Character vector of IDs of the documents in ldaresult

limit

Integer/numeric: How often a word must be allocated to a topic to count these article as belonging to this topic - if 0<limit<1 proportion is used (default: 10)?

alloc

Character: Should every article be allocated to multiple topics ("multi"), or maximum one topic ("unique"), or the most represantative - exactly one - topic ("best") (default: "multi")? If alloc = "best" limit has no effect.

select

Integer vector: Which topics of ldaresult should be plotted (default: all topics)?

tnames

Character vector of same length as select - labels for the topics (default are the first returned words of top.topic.words from the lda package for each topic)

search

See filterWord

ignore.case

See filterWord

type

Character: Should counts/proportion of documents, where every "docs" or words "words" be plotted (default: "docs")?

rel

Logical. Should counts (FALSE) or proportion (TRUE) be plotted (default: TRUE)?

mark

Logical: Should years be marked by vertical lines (default: TRUE)?

unit

Character: To which unit should dates be floored (default: "month")? Other possible units are "bimonth", "quarter", "season", "halfyear", "year", for more units see round_date

curves

Character: Should "exact", "smooth" curve or "both" be plotted (default: "exact")?

smooth

Numeric: Smoothing parameter which is handed over to lowess as f (default: 0.05)

main

Character: Graphical parameter

xlab

Character: Graphical parameter

ylab

Character: Graphical parameter

ylim

Graphical parameter (default if rel = TRUE: c(0, 1))

both.lwd

Graphical parameter for smoothed values if curves = "both"

both.lty

Graphical parameter for smoothed values if curves = "both"

col

Graphical parameter, could be a vector. If curves = "both" the function will for every wordgroup plot at first the exact and then the smoothed curve - this is important for your col order.

legend

Character: Value(s) to specify the legend coordinates (default: "topright"). If "none" no legend is plotted.

natozero

Logical. Should NAs be coerced to zeros (default: TRUE)? Only has effect if rel = TRUE.

file

Character: File path if a pdf should be created

...

Additional graphical parameters

Value

A plot. Invisible: A dataframe with columns date and tnames with the counts/proportion of the selected topics.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
poliPraesidents <- filterWord(object=poliClean, search=c("bush", "obama"))
words10 <- makeWordlist(text=poliPraesidents$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliPraesidents$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=5, vocab=words10)
plotWordSub(object=poliClean, ldaresult=LDAresult, ldaID=names(poliLDA), search="obama")

## End(Not run)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.