knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
top2vecr is an R implementation of top2vec, a topic modelling technique relying on jointly learned document and word embeddings.
The main idea is that documents found close to each other in the joint document-word vector space can be interpreted as topics. Words similar to these document clusters are used as topic descriptors. UMAP is used to reduce the dimensionality of the original vector space -- as produced by doc2vec -- and HDBSCAN is used to identify document clusters.
As opposed to the original Python implementation, this package does not yet support the use of pre-trained sentence encoders and transformers.
You can install the development version from GitHub with:
# install.packages("remotes") remotes::install_github("michalovadek/top2vecr")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.