sentiment_score: Simple Sentiment Scores

View source: R/sentiment.R

sentiment_scoreR Documentation

Simple Sentiment Scores

Description

This uses a simple model (xgboost or glm) to return a simple predictive score, where numbers closer to 1 are more positive and numbers closer to -1 are more negative. This can be used to determine whether the sentiment is positive or negative.

Usage

sentiment_score(
  x = NULL,
  model = names(default_models),
  scoring = c("xgb", "glm"),
  scoring_version = "1.0",
  batch_size = 100,
  ...
)

Arguments

x

A plain text vector or column name if data is supplied. If you know what you're doing, you can also pass in a 512-D numeric embedding.

model

An embedding name from tensorflow-hub, some of which are "en" (english large or not) and "multi" (multi-lingual large or not).

scoring

Model to use for scoring the embedding matrix (currently either "xgb" or "glm").

scoring_version

The scoring version to use, currently only 1.0, but other versions might be supported in the future.

batch_size

Size of batches to use. Larger numbers will be faster than smaller numbers, but do not exhaust your system memory!

...

Additional arguments passed to conda_install() or virtualenv_install().

Details

Uses simple preditive models on embeddings to provide probability of positive score (rescaled to -1:1 for consistency with other packages).

Value

numeric vector of length(x) containing a re-scaled sentiment probabilities.

Examples

## Not run: 
envname <- "r-sentiment-ai"

# make sure to install sentiment ai (install_sentiment.ai)
# install_sentiment.ai(envname = envname,
#                      method  = "conda")

# running the model
mod_xgb <- sentiment_score(x       = airline_tweets$text,
                           model   = "en.large",
                           scoring = "xgb",
                           envname = envname)
mod_glm <- sentiment_score(x       = airline_tweets$text,
                           model   = "en.large",
                           scoring = "glm",
                           envname = envname)

# checking performance
pos_neg <- factor(airline_tweets$airline_sentiment,
                  levels = c("negative", "neutral", "positive"))
pos_neg <- (as.numeric(pos_neg) - 1) / 2
cosine(mod_xgb, pos_neg)
cosine(mod_glm, pos_neg)

# you could also calculate accuracy/kappa



## End(Not run)


sentiment.ai documentation built on March 19, 2022, 2:15 a.m.