knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.path = "img/README_") library(printr)
CAVA is an R package to assit in working with dictionary (keywords/lexical text analysis) in a valid way. It allows you to use an embeddings model to do dictionary expansion/augmentation, check its coherence, (and at some future date) validation and analysis.
For a longer description, see our ICA Tool Demo abstract.
You can install CAVA from github:
remotes::install_github("vanatteveldt/CAVA")
Before starting, you need an embeddings model. Currently, we only support Fasttext .bin models. For example, you can download the cc.en.300.bin.gz model.
The main functions exposed to cava are shown below. For a more elaborate example, please see the example usage file.
Loading the FastText mnodel, using the state of the union speeches as target corpus:
library(CAVA) corpus = quanteda::corpus(sotu::sotu_text, docvars = sotu::sotu_meta) vectors = load_fasttext("cc.en.300.bin", corpus)
Expanding a dictionary using wildcard and similarity:
dictionary = c("fin*", "eco*") dictionary = expand_wildcards(dictionary, vectors) candidates = similar_words(dictionary, vectors) dictionary = c(dictionary, candidates$word[candidates$similarity>.4]) head(candidates)
Expanding a dictionary using antonyms:
positive = c("good", "nice", "best", "happy") negative = c("evil", "nasty", "worst", "bad", "unhappy") candidates = similar_words(positive, vectors, antonyms = negative) head(candidates)
Computing and plotting pairwise similarities:
similarities = pairwise_similarities(dictionary, vectors) similarities |> similarity_graph(max_edges=100) |> plot()
Computing similarity to dictionary centroid (sorted with most distances words on top):
similarity_to_centroid(dictionary, vectors) |> head()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.