plotWordpt: Plots Counts of Topics-Words-Combination over Time (Relative...
In tosca: Tools for Statistical Content Analysis

plotWordpt

R Documentation

Plots Counts of Topics-Words-Combination over Time (Relative to Topics)

Description

Creates a plot of the counts/proportion of specified combination of topics and words. The plot shows how often a word appears in a topic. It is important to keep in mind that the baseline for proportions are the sums of topics, not sums of words. See also plotTopicWord. There is an option to plot all curves in one plot or to create one plot for every curve (see pages). In addition the plots can be written to a pdf by setting file.

Usage

plotWordpt(
  object,
  docs,
  ldaresult,
  ldaID,
  select = 1:nrow(ldaresult$document_sums),
  link = c("and", "or"),
  wordlist = lda::top.topic.words(ldaresult$topics, 1),
  tnames,
  wnames,
  rel = FALSE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  legend = ifelse(pages, "onlyLast:topright", "topright"),
  pages = FALSE,
  natozero = TRUE,
  file,
  main,
  xlab,
  ylab,
  ylim,
  both.lwd,
  both.lty,
  col,
  ...
)

Arguments

`object`	`textmeta` object with strictly tokenized `text` component (character vectors) - e.g. a result of `cleanTexts`
`docs`	Object as a result of `LDAprep` which was handed over to `LDAgen`
`ldaresult`	The result of a function call `LDAgen` with `docs` as argument
`ldaID`	Character vector of IDs of the documents in `ldaresult`
`select`	List of integer vectors. Every list element is an 'or' link, every integer string in a vector is linked by the argument `link`. If `select` is only a `integer` vector it will be coerced to a list of the same length as the vector (see `as.list`), so that the argument `link` has no effect. Each integer vector as a list element represents one curve in the outcoming plot
`link`	Character: Should the (inner) integer vectors of each list element be linked by an `"and"` or an `"or"` (default: `"and"`)?
`wordlist`	List of character vectors: Which words - always linked by an "or" - should be taken into account for plotting the topic counts/proportion (default: the first `top.topic.words` per topic as simple character vector)?
`tnames`	Character vector of same length as `select` - labels for the topics (default are the first returned words of
`wnames`	Character vector of same length as `wordlist` - labels for every group of 'and' linked words `top.topic.words` from the `lda` package for each topic)
`rel`	Logical: Should counts (`FALSE`) or proportion (`TRUE`) be plotted (default: `FALSE`)?
`mark`	Logical: Should years be marked by vertical lines (default: `TRUE`)?
`unit`	Character: To which unit should dates be floored (default: `"month"`)? Other possible units are `"bimonth"`, `"quarter"`, `"season"`, `"halfyear"`, `"year"`, for more units see `round_date`
`curves`	Character: Should `"exact"`, `"smooth"` curve or `"both"` be plotted (default: `"exact"`)?
`smooth`	Numeric: Smoothing parameter which is handed over to `lowess` as `f` (default: `0.05`)
`legend`	Character: Value(s) to specify the legend coordinates (default: `"topright"`, `"onlyLast:topright"` for `pages = TRUE` respectively). If "none" no legend is plotted.
`pages`	Logical: Should all curves be plotted in a single plot (default: `FALSE`)? In addtion you could set `legend = "onlyLast:<argument>"` with `<argument>` as a character `legend` argument for only plotting a legend on the last plot of set.
`natozero`	Logical: Should NAs be coerced to zeros (default: `TRUE`)?
`file`	Character: File path if a pdf should be created
`main`	Character: Graphical parameter
`xlab`	Ccharacter: Graphical parameter
`ylab`	Character: Graphical parameter
`ylim`	Graphical parameter
`both.lwd`	Graphical parameter for smoothed values if `curves = "both"`
`both.lty`	Graphical parameter for smoothed values if `curves = "both"`
`col`	Graphical parameter, could be a vector. If `curves = "both"` the function will plot for every wordgroup the exact at first and then the smoothed curve - this is important for your col order.
`...`	Additional graphical parameters

Value

A plot. Invisible: A dataframe with columns date and tnames: wnames with the counts/proportion of the selected combination of topics and words.

Examples

## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
words10 <- makeWordlist(text=poliClean$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliClean$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=10, vocab=words10)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA))
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA), rel=TRUE)

# Differences between plotTopicWord and plotWordpt
par(mfrow=c(2,2))
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"), rel=FALSE)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
           select=c(1,3,8), wordlist=c("bush"), rel=FALSE)
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"), rel=TRUE)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
           select=c(1,3,8), wordlist=c("bush"), rel=TRUE)

## End(Not run)

tosca documentation built on June 8, 2025, 11:21 a.m.