transformer_scores: Sentiment Analysis Scores

View source: R/transformer_scores.R

transformer_scoresR Documentation

Sentiment Analysis Scores

Description

Uses sentiment analysis pipelines from huggingface to compute probabilities that the text corresponds to the specified classes

Usage

transformer_scores(
  text,
  classes,
  multiple_classes = FALSE,
  transformer = c("cross-encoder-roberta", "cross-encoder-distilroberta",
    "facebook-bart"),
  device = c("auto", "cpu", "cuda"),
  preprocess = FALSE,
  keep_in_env = TRUE,
  envir = 1,
  local_model_path = NULL
)

Arguments

text

Character vector or list. Text in a vector or list data format

classes

Character vector. Classes to score the text

multiple_classes

Boolean. Whether the text can belong to multiple true classes. Defaults to FALSE. Set to TRUE to get scores with multiple classes

transformer

Character. Specific zero-shot sentiment analysis transformer to be used. Default options:

"cross-encoder-roberta"

Uses Cross-Encoder's Natural Language Interface RoBERTa Base zero-shot classification model trained on the Stanford Natural Language Inference (SNLI) corpus and MultiNLI datasets

"cross-encoder-distilroberta"

Uses Cross-Encoder's Natural Language Interface DistilRoBERTa Base zero-shot classification model trained on the Stanford Natural Language Inference (SNLI) corpus and MultiNLI datasets. The DistilRoBERTa is intended to be a smaller, more lightweight version of "cross-encoder-roberta", that sacrifices some accuracy for much faster speed (see https://www.sbert.net/docs/cross_encoder/pretrained_models.html#nli)

"facebook-bart"

Uses Facebook's BART Large zero-shot classification model trained on the Multi-Genre Natural Language Inference (MultiNLI) dataset

Defaults to "cross-encoder-distilroberta"

Also allows any zero-shot classification models with a pipeline from huggingface to be used by using the specified name (e.g., "typeform/distilbert-base-uncased-mnli"; see Examples)

Note: Using custom HuggingFace model IDs beyond the recommended models is done at your own risk. Large models may cause memory issues or crashes, especially on systems with limited resources. The package has been optimized and tested with the recommended models listed above.

device

Character. Whether to use CPU or GPU for inference. Defaults to "auto" which will use GPU over CPU (if CUDA-capable GPU is setup). Set to "cpu" to perform over CPU

preprocess

Boolean. Should basic preprocessing be applied? Includes making lowercase, keeping only alphanumeric characters, removing escape characters, removing repeated characters, and removing white space. Defaults to FALSE. Transformers generally are OK without preprocessing and handle many of these functions internally, so setting to TRUE will not change performance much

keep_in_env

Boolean. Whether the classifier should be kept in your global environment. Defaults to TRUE. By keeping the classifier in your environment, you can skip re-loading the classifier every time you run this function. TRUE is recommended

envir

Numeric. Environment for the classifier to be saved for repeated use. Defaults to the global environment

local_model_path

Optional. Path to a local directory containing a pre-downloaded HuggingFace model. If provided, the model will be loaded from this directory instead of being downloaded from HuggingFace. This is useful for offline usage or for using custom fine-tuned models.

On Linux/Mac, look in ~/.cache/huggingface/hub/ folder for downloaded models. Navigate to the snapshots folder for the relevant model and point to the directory which contains the config.json file. For example: "/home/username/.cache/huggingface/hub/models–cross-encoder–nli-distilroberta-base/snapshots/b5b020e8117e1ddc6a0c7ed0fd22c0e679edf0fa/"

On Windows, the base path is C:\Users\USERNAME\.cache\huggingface\transformers\

Warning: Using very large models from local paths may cause memory issues or crashes depending on your system's resources.

Value

Returns probabilities for the text classes

Data Privacy

All processing is done locally with the downloaded model, and your text is never sent to any remote server or third-party.

Author(s)

Alexander P. Christensen <alexpaulchristensen@gmail.com>

References

# BART
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

# RoBERTa
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

# Zero-shot classification
Yin, W., Hay, J., & Roth, D. (2019). Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161.

# MultiNLI dataset
Williams, A., Nangia, N., & Bowman, S. R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.

Examples

# Load data
data(neo_ipip_extraversion)

# Example text
text <- neo_ipip_extraversion$friendliness[1:5]

## Not run: 
# Cross-Encoder DistilRoBERTa
transformer_scores(
 text = text,
 classes = c(
   "friendly", "gregarious", "assertive",
   "active", "excitement", "cheerful"
 )
)

# Facebook BART Large
transformer_scores(
 text = text,
 classes = c(
   "friendly", "gregarious", "assertive",
   "active", "excitement", "cheerful"
 ),
 transformer = "facebook-bart"
)

# Directly from huggingface: typeform/distilbert-base-uncased-mnli
transformer_scores(
 text = text,
 classes = c(
   "friendly", "gregarious", "assertive",
   "active", "excitement", "cheerful"
 ),
 transformer = "typeform/distilbert-base-uncased-mnli"
)

## End(Not run)


transforEmotion documentation built on June 8, 2025, 10:25 a.m.