In michalovadek/top2vecr: Distributed Representations of Topics

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

top2vecr

top2vecr is an R implementation of top2vec, a topic modelling technique relying on jointly learned document and word embeddings.

The main idea is that documents found close to each other in the joint document-word vector space can be interpreted as topics. Words similar to these document clusters are used as topic descriptors. UMAP is used to reduce the dimensionality of the original vector space -- as produced by doc2vec -- and HDBSCAN is used to identify document clusters.

As opposed to the original Python implementation, this package does not yet support the use of pre-trained sentence encoders and transformers.

Installation

You can install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("michalovadek/top2vecr")

michalovadek/top2vecr documentation built on Dec. 21, 2021, 5:59 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

michalovadek/top2vecr
Distributed Representations of Topics

In michalovadek/top2vecr: Distributed Representations of Topics

top2vecr

Installation

R Package Documentation

Browse R Packages

We want your feedback!

michalovadek/top2vecr Distributed Representations of Topics

In michalovadek/top2vecr: Distributed Representations of Topics

top2vecr

Installation

R Package Documentation

Browse R Packages

We want your feedback!

michalovadek/top2vecr
Distributed Representations of Topics