A fast, clear, and tidy implementation of the rule-based sentiment analysis algorithm VADER (Valence Aware Dictionary and Sentiment Reasoner).
Please note that this package (and this documentation) is under active development. At present it’s pretty well tested and functional, but there are known limitations and there may yet be bugs. This is a development package not yet on CRAN and things may change. Expect more/better documentation and development as soon as time allows.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("chris31415926535/tidyvader")
VADER’s authors describe it on their GitHub page as “a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media” that was originally written in Python (link).
Let’s break this definition down:
This is notably different from two other common approaches to sentiment analysis:
VADER has advantages over both of these approaches. First, it’s more nuanced than a pure bag-of-words approach and so it should be more accurate. Second, it’s more surveyable than an unsupervised approach and so users can make informed decisions about when and how it’s appropriate to use it.
This example shows how to send sentences in a dataframe through
vader()
. It also shows how punctuation, capitalization, modifiers, and
negations all work together to affect a sentence’s compound score.
library(tidyvader)
library(tibble)
library(magrittr)
library(knitr)
# set up a tibble with some sentences
texts <- tibble(sentences = c("I feel happy today.",
"I feel happy today!",
"I feel HAPPY today!",
"I feel NOT HAPPY today!",
"I feel REALLY NOT HAPPY today!"))
# pipe the data to tidyvader::vader() and specify the column with text
texts %>%
tidyvader::vader(sentences) %>%
knitr::kable()
| sentences | compound | pos | neu | neg | | :------------------------------ | -------: | -----: | -----: | -----: | | I feel happy today. | 0.5719 | 0.5522 | 0.4478 | 0.0000 | | I feel happy today! | 0.6114 | 0.5709 | 0.4291 | 0.0000 | | I feel HAPPY today! | 0.6932 | 0.6117 | 0.3883 | 0.0000 | | I feel NOT HAPPY today! | -0.5903 | 0.0000 | 0.5107 | 0.4893 | | I feel REALLY NOT HAPPY today! | -0.6761 | 0.0000 | 0.5234 | 0.4766 |
If you want to score a single sentence in a length-1 character vector
you can use vader_chr()
. This is good for quickly checking things, but
it’s much slower than vader()
so I don’t recommend it for analysis at
scale. The results will come in a one-row tibble, like so:
tidyvader::vader_chr("I feel HAPPY today!") %>%
knitr::kable()
| compound | pos | neu | neg | | -------: | -----: | -----: | --: | | 0.6932 | 0.6117 | 0.3883 | 0 |
You can also easily pull the VADER dictionaries and some test sentences
in a nested tibble using get_vader_dictionaries()
. It’s easy to take a
look through RStudio’s viewer, and you can also pull them out and
inspect them as regular tibbles.
library(dplyr)
vader_dicts <- tidyvader::get_vader_dictionaries()
vader_sentiments <- vader_dicts %>%
filter(name == "dict_sent_sorted") %>%
pull(dictionary) %>% `[[` (1)
vader_sentiments[2968:2973,] %>%
knitr::kable()
| word | sentiment | | :---------- | --------: | | friendship | 1.9 | | friendships | 1.6 | | fright | -1.6 | | frighted | -1.4 | | frighten | -1.4 | | frightened | -1.9 |
VADER’s Python GitHub page: https://github.com/cjhutto/vaderSentiment
Citation for conference proceedings introducing VADER:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.