In vanatteveldt/CAVA:

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.path = "img/README_")
library(printr)

Ça va, CAVA? Dictionary Coherence, Augmentation, (Validation and Analysis)

CAVA is an R package to assit in working with dictionary (keywords/lexical text analysis) in a valid way. It allows you to use an embeddings model to do dictionary expansion/augmentation, check its coherence, (and at some future date) validation and analysis.

For a longer description, see our ICA Tool Demo abstract.

Installing and obtaining an embeddings model

You can install CAVA from github:

remotes::install_github("vanatteveldt/CAVA")

Before starting, you need an embeddings model. Currently, we only support Fasttext .bin models. For example, you can download the cc.en.300.bin.gz model.

Using CAVA

The main functions exposed to cava are shown below. For a more elaborate example, please see the example usage file.

Loading the FastText mnodel, using the state of the union speeches as target corpus:

library(CAVA)
corpus = quanteda::corpus(sotu::sotu_text, docvars = sotu::sotu_meta)
vectors = load_fasttext("cc.en.300.bin", corpus)

Augmentation

Expanding a dictionary using wildcard and similarity:

dictionary = c("fin*", "eco*")
dictionary = expand_wildcards(dictionary, vectors)
candidates = similar_words(dictionary, vectors)
dictionary = c(dictionary, candidates$word[candidates$similarity>.4])
head(candidates)

Expanding a dictionary using antonyms:

positive = c("good", "nice", "best", "happy")
negative = c("evil", "nasty", "worst", "bad", "unhappy")
candidates = similar_words(positive, vectors, antonyms = negative)
head(candidates)

Coherence

Computing and plotting pairwise similarities:

similarities = pairwise_similarities(dictionary, vectors)
similarities |> similarity_graph(max_edges=100) |> plot()

Computing similarity to dictionary centroid (sorted with most distances words on top):

similarity_to_centroid(dictionary, vectors) |> head()

vanatteveldt/CAVA documentation built on June 4, 2022, 1:20 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vanatteveldt/CAVA

In vanatteveldt/CAVA:

Ça va, CAVA? Dictionary Coherence, Augmentation, (Validation and Analysis)

Installing and obtaining an embeddings model

Using CAVA

Augmentation

Coherence

R Package Documentation

Browse R Packages

We want your feedback!