README.md

tidyvader

A fast, clear, and tidy implementation of the rule-based sentiment analysis algorithm VADER (Valence Aware Dictionary and Sentiment Reasoner).

Under Development

Please note that this package (and this documentation) is under active development. At present it’s pretty well tested and functional, but there are known limitations and there may yet be bugs. This is a development package not yet on CRAN and things may change. Expect more/better documentation and development as soon as time allows.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("chris31415926535/tidyvader")

What is VADER?

VADER’s authors describe it on their GitHub page as “a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media” that was originally written in Python (link).

Let’s break this definition down:

This is notably different from two other common approaches to sentiment analysis:

VADER has advantages over both of these approaches. First, it’s more nuanced than a pure bag-of-words approach and so it should be more accurate. Second, it’s more surveyable than an unsupervised approach and so users can make informed decisions about when and how it’s appropriate to use it.

Why tidyvader?

Example

This example shows how to send sentences in a dataframe through vader(). It also shows how punctuation, capitalization, modifiers, and negations all work together to affect a sentence’s compound score.

library(tidyvader)
library(tibble)
library(magrittr)
library(knitr)

# set up a tibble with some sentences
texts <- tibble(sentences = c("I feel happy today.",
                              "I feel happy today!",
                              "I feel HAPPY today!",
                              "I feel NOT HAPPY today!",
                              "I feel REALLY NOT HAPPY today!"))

# pipe the data to tidyvader::vader() and specify the column with text 
texts %>%
  tidyvader::vader(sentences) %>%
  knitr::kable()

| sentences | compound | pos | neu | neg | | :------------------------------ | -------: | -----: | -----: | -----: | | I feel happy today. | 0.5719 | 0.5522 | 0.4478 | 0.0000 | | I feel happy today! | 0.6114 | 0.5709 | 0.4291 | 0.0000 | | I feel HAPPY today! | 0.6932 | 0.6117 | 0.3883 | 0.0000 | | I feel NOT HAPPY today! | -0.5903 | 0.0000 | 0.5107 | 0.4893 | | I feel REALLY NOT HAPPY today! | -0.6761 | 0.0000 | 0.5234 | 0.4766 |

If you want to score a single sentence in a length-1 character vector you can use vader_chr(). This is good for quickly checking things, but it’s much slower than vader() so I don’t recommend it for analysis at scale. The results will come in a one-row tibble, like so:

tidyvader::vader_chr("I feel HAPPY today!") %>%
  knitr::kable()

| compound | pos | neu | neg | | -------: | -----: | -----: | --: | | 0.6932 | 0.6117 | 0.3883 | 0 |

You can also easily pull the VADER dictionaries and some test sentences in a nested tibble using get_vader_dictionaries(). It’s easy to take a look through RStudio’s viewer, and you can also pull them out and inspect them as regular tibbles.

library(dplyr)

vader_dicts <- tidyvader::get_vader_dictionaries()

vader_sentiments <- vader_dicts %>%
  filter(name == "dict_sent_sorted") %>%
  pull(dictionary) %>% `[[` (1)

vader_sentiments[2968:2973,] %>%
  knitr::kable()

| word | sentiment | | :---------- | --------: | | friendship | 1.9 | | friendships | 1.6 | | fright | -1.6 | | frighted | -1.4 | | frighten | -1.4 | | frightened | -1.9 |

Known Limitations

Resources and References



chris31415926535/tidyvader documentation built on June 9, 2025, 1:50 p.m.