plotWordpt: Plots Counts of Topics-Words-Combination over Time (Relative...

Description Usage Arguments Value Examples

View source: R/plotWordpt.R

Description

Creates a plot of the counts/proportion of specified combination of topics and words. The plot shows how often a word appears in a topic. It is important to keep in mind that the baseline for proportions are the sums of topics, not sums of words. See also plotTopicWord. There is an option to plot all curves in one plot or to create one plot for every curve (see pages). In addition the plots can be written to a pdf by setting file.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
plotWordpt(
  object,
  docs,
  ldaresult,
  ldaID,
  select = 1:nrow(ldaresult$document_sums),
  link = c("and", "or"),
  wordlist = lda::top.topic.words(ldaresult$topics, 1),
  tnames,
  wnames,
  rel = FALSE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  legend = ifelse(pages, "onlyLast:topright", "topright"),
  pages = FALSE,
  natozero = TRUE,
  file,
  main,
  xlab,
  ylab,
  ylim,
  both.lwd,
  both.lty,
  col,
  ...
)

Arguments

object

textmeta object with strictly tokenized text component (character vectors) - e.g. a result of cleanTexts

docs

Object as a result of LDAprep which was handed over to LDAgen

ldaresult

The result of a function call LDAgen with docs as argument

ldaID

Character vector of IDs of the documents in ldaresult

select

List of integer vectors. Every list element is an 'or' link, every integer string in a vector is linked by the argument link. If select is only a integer vector it will be coerced to a list of the same length as the vector (see as.list), so that the argument link has no effect. Each integer vector as a list element represents one curve in the outcoming plot

link

Character: Should the (inner) integer vectors of each list element be linked by an "and" or an "or" (default: "and")?

wordlist

List of character vectors: Which words - always linked by an "or" - should be taken into account for plotting the topic counts/proportion (default: the first top.topic.words per topic as simple character vector)?

tnames

Character vector of same length as select - labels for the topics (default are the first returned words of

wnames

Character vector of same length as wordlist - labels for every group of 'and' linked words top.topic.words from the lda package for each topic)

rel

Logical: Should counts (FALSE) or proportion (TRUE) be plotted (default: FALSE)?

mark

Logical: Should years be marked by vertical lines (default: TRUE)?

unit

Character: To which unit should dates be floored (default: "month")? Other possible units are "bimonth", "quarter", "season", "halfyear", "year", for more units see round_date

curves

Character: Should "exact", "smooth" curve or "both" be plotted (default: "exact")?

smooth

Numeric: Smoothing parameter which is handed over to lowess as f (default: 0.05)

legend

Character: Value(s) to specify the legend coordinates (default: "topright", "onlyLast:topright" for pages = TRUE respectively). If "none" no legend is plotted.

pages

Logical: Should all curves be plotted in a single plot (default: FALSE)? In addtion you could set legend = "onlyLast:<argument>" with <argument> as a character legend argument for only plotting a legend on the last plot of set.

natozero

Logical: Should NAs be coerced to zeros (default: TRUE)?

file

Character: File path if a pdf should be created

main

Character: Graphical parameter

xlab

Ccharacter: Graphical parameter

ylab

Character: Graphical parameter

ylim

Graphical parameter

both.lwd

Graphical parameter for smoothed values if curves = "both"

both.lty

Graphical parameter for smoothed values if curves = "both"

col

Graphical parameter, could be a vector. If curves = "both" the function will plot for every wordgroup the exact at first and then the smoothed curve - this is important for your col order.

...

Additional graphical parameters

Value

A plot. Invisible: A dataframe with columns date and tnames: wnames with the counts/proportion of the selected combination of topics and words.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
words10 <- makeWordlist(text=poliClean$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliClean$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=10, vocab=words10)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA))
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA), rel=TRUE)

# Differences between plotTopicWord and plotWordpt
par(mfrow=c(2,2))
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"), rel=FALSE)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
           select=c(1,3,8), wordlist=c("bush"), rel=FALSE)
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"), rel=TRUE)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
           select=c(1,3,8), wordlist=c("bush"), rel=TRUE)

## End(Not run)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.