Docs in LDA-generated...
In tosca: Tools for Statistical Content Analysis

plotWordSub

R Documentation

Plotting Counts/Proportion of Words/Docs in LDA-generated Topic-Subcorpora over Time

Description

Creates a plot of the counts/proportion of words/docs in corpora which are generated by a ldaresult. Therefore an article is allocated to a topic - and then to the topics corpus - if there are enough (see limit and alloc) allocations of words in the article to the corresponding topic. Additionally the corpora are reduced by filterWord and a search-argument. The plot shows counts of subcorpora or if rel = TRUE proportion of subcorpora to its corresponding whole corpus.

Usage

plotWordSub(
  object,
  ldaresult,
  ldaID,
  limit = 10,
  alloc = c("multi", "unique", "best"),
  select = 1:nrow(ldaresult$document_sums),
  tnames,
  search,
  ignore.case = TRUE,
  type = c("docs", "words"),
  rel = TRUE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  main,
  xlab,
  ylab,
  ylim,
  both.lwd,
  both.lty,
  col,
  legend = "topright",
  natozero = TRUE,
  file,
  ...
)

Arguments

`object`	`textmeta` object with strictly tokenized `text` component (character vectors) - such as a result of `cleanTexts`
`ldaresult`	The result of a function call `LDAgen`
`ldaID`	Character vector of IDs of the documents in `ldaresult`
`limit`	Integer/numeric: How often a word must be allocated to a topic to count these article as belonging to this topic - if `0<limit<1` proportion is used (default: `10`)?
`alloc`	Character: Should every article be allocated to multiple topics (`"multi"`), or maximum one topic (`"unique"`), or the most represantative - exactly one - topic (`"best"`) (default: `"multi"`)? If `alloc = "best"` `limit` has no effect.
`select`	Integer vector: Which topics of `ldaresult` should be plotted (default: all topics)?
`tnames`	Character vector of same length as `select` - labels for the topics (default are the first returned words of `top.topic.words` from the `lda` package for each topic)
`search`	See `filterWord`
`ignore.case`	See `filterWord`
`type`	Character: Should counts/proportion of documents, where every `"docs"` or words `"words"` be plotted (default: `"docs"`)?
`rel`	Logical. Should counts (`FALSE`) or proportion (`TRUE`) be plotted (default: `TRUE`)?
`mark`	Logical: Should years be marked by vertical lines (default: `TRUE`)?
`unit`	Character: To which unit should dates be floored (default: `"month"`)? Other possible units are "bimonth", "quarter", "season", "halfyear", "year", for more units see `round_date`
`curves`	Character: Should `"exact"`, `"smooth"` curve or `"both"` be plotted (default: `"exact"`)?
`smooth`	Numeric: Smoothing parameter which is handed over to `lowess` as `f` (default: `0.05`)
`main`	Character: Graphical parameter
`xlab`	Character: Graphical parameter
`ylab`	Character: Graphical parameter
`ylim`	Graphical parameter (default if `rel = TRUE`: `c(0, 1)`)
`both.lwd`	Graphical parameter for smoothed values if `curves = "both"`
`both.lty`	Graphical parameter for smoothed values if `curves = "both"`
`col`	Graphical parameter, could be a vector. If `curves = "both"` the function will for every wordgroup plot at first the exact and then the smoothed curve - this is important for your col order.
`legend`	Character: Value(s) to specify the legend coordinates (default: "topright"). If "none" no legend is plotted.
`natozero`	Logical. Should NAs be coerced to zeros (default: `TRUE`)? Only has effect if `rel = TRUE`.
`file`	Character: File path if a pdf should be created
`...`	Additional graphical parameters

Value

A plot. Invisible: A dataframe with columns date and tnames with the counts/proportion of the selected topics.

Examples

## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
poliPraesidents <- filterWord(object=poliClean, search=c("bush", "obama"))
words10 <- makeWordlist(text=poliPraesidents$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliPraesidents$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=5, vocab=words10)
plotWordSub(object=poliClean, ldaresult=LDAresult, ldaID=names(poliLDA), search="obama")

## End(Not run)

tosca documentation built on June 8, 2025, 11:21 a.m.