meanr is an R package performing sentiment analysis. Its main method,
score(), computes sentiment as a simple sum of the counts of positive (+1) and negative (-1) sentiment words in a piece of text. More sophisticated techniques are available to R, for example in the qdap package's
polarity() function. This package uses the Hu and Liu sentiment dictionary, same as everybody else.
meanr is significantly faster than everything else I tried (which was actually the motivation for its creation), but I don't claim to have tried everything. I believe the package is quite fast. However, the method is merely a dictionary lookup, so it ignores word context like in more sophisticated methods. On the other hand, the more sophisticated tools are very slow. If you have a large volume of text, I believe there is value in getting a "first glance" at the data, and meanr allows you to do this very quickly.
The stable version is available on CRAN:
The development version is maintained on GitHub:
I have a dataset that, for legal reasons, I can not describe, much less provide. You can think of it like a collection of tweets (they are not tweets). But take my word for it that it's real, English language text. The data is in the form of a vector of strings, which we'll call
x = readRDS("x.rds") length(x) ##  655760 sum(nchar(x)) ##  162663972 library(meanr) system.time(s <- score(x)) ## user system elapsed ## 1.072 0.000 0.285 head(s) ## positive negative score wc ## 1 2 0 2 32 ## 2 5 0 5 29 ## 3 4 2 2 67 ## 4 12 3 9 203 ## 5 8 2 6 101 ## 6 4 3 1 99
score() function receives a vector of strings, and operates on each one as follows:
This is all done in four passes of each string; each pass corresponds to each of the enumerated items above. The hash tables uses perfect hash functions generated by gperf.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.