knitr::opts_chunk$set(
  echo = TRUE,
  comment = "#",
  collapse = TRUE,
  fig.path = "man/figures/README-",
  fig.width = 8,
  fig.height = 5
  )

sentopics

CRAN Version Codecov test coverage R-CMD-check

Installation

A stable version sentopics is available on CRAN:

install.packages("sentopics")

The latest development version can be installed from GitHub:

``` {r eval = FALSE} devtools::install_github("odelmarcelle/sentopics")

The development version requires the appropriate tools to compile C++ and Fortran source code.

## Basic usage

Using a sample of press conferences from the European Central Bank, an LDA model is easily created from a list of tokenized texts. See https://quanteda.io for details on `tokens` input objects and pre-processing functions.

``` {r}
library("sentopics")
print(ECB_press_conferences_tokens, 2)
set.seed(123)
lda <- LDA(ECB_press_conferences_tokens, K = 3, alpha = .1)
lda <- fit(lda, 100)
lda

There are various way to extract results from the model: it is either possible to directly access the estimated mixtures from the lda object or to use some helper functions.

# The document-topic distributions
head(lda$theta) 
# The document-topic in a 'long' format & optionally with meta-data
head(melt(lda, include_docvars = FALSE))
# The most probable words per topic
topWords(lda, output = "matrix") 

Two visualization are also implemented: plot_topWords() display the most probable words and plot() summarize the topic proportions and their top words.

plot(lda)
plot(lda) |> plotly::layout(width = 500, height = 500)

After properly incorporating date and sentiment metadata data (if they are not already present in the tokens input), time series functions allows to study the evolution of topic proportions and related sentiment.

sentopics_date(lda)  |> head(2)
sentopics_sentiment(lda) |> head(2)
proportion_topics(lda, period = "month") |> head(2)
plot_sentiment_breakdown(lda, period = "quarter", rolling_window = 3)

Advanced usage

Feel free to refer to the vignettes of the package for a more extensive introduction to the features of the package. Because the package is not yet on CRAN, you'll have to build the vignettes locally.

vignette("Basic_usage", package = "sentopics")
vignette("Topical_time_series", package = "sentopics")


odelmarcelle/sentopics documentation built on Jan. 10, 2025, 2:58 p.m.