View source: R/transformer_scores.R
transformer_scores | R Documentation |
Uses sentiment analysis pipelines from huggingface to compute probabilities that the text corresponds to the specified classes
transformer_scores(
text,
classes,
multiple_classes = FALSE,
transformer = c("cross-encoder-roberta", "cross-encoder-distilroberta",
"facebook-bart"),
device = c("auto", "cpu", "cuda"),
preprocess = FALSE,
keep_in_env = TRUE,
envir = 1,
local_model_path = NULL
)
text |
Character vector or list. Text in a vector or list data format |
classes |
Character vector. Classes to score the text |
multiple_classes |
Boolean.
Whether the text can belong to multiple true classes.
Defaults to |
transformer |
Character. Specific zero-shot sentiment analysis transformer to be used. Default options:
Defaults to Also allows any zero-shot classification models with a pipeline
from huggingface
to be used by using the specified name (e.g., Note: Using custom HuggingFace model IDs beyond the recommended models is done at your own risk. Large models may cause memory issues or crashes, especially on systems with limited resources. The package has been optimized and tested with the recommended models listed above. |
device |
Character.
Whether to use CPU or GPU for inference.
Defaults to |
preprocess |
Boolean.
Should basic preprocessing be applied?
Includes making lowercase, keeping only alphanumeric characters,
removing escape characters, removing repeated characters,
and removing white space.
Defaults to |
keep_in_env |
Boolean.
Whether the classifier should be kept in your global environment.
Defaults to |
envir |
Numeric. Environment for the classifier to be saved for repeated use. Defaults to the global environment |
local_model_path |
Optional. Path to a local directory containing a pre-downloaded HuggingFace model. If provided, the model will be loaded from this directory instead of being downloaded from HuggingFace. This is useful for offline usage or for using custom fine-tuned models. On Linux/Mac, look in ~/.cache/huggingface/hub/ folder for downloaded models. Navigate to the snapshots folder for the relevant model and point to the directory which contains the config.json file. For example: "/home/username/.cache/huggingface/hub/models–cross-encoder–nli-distilroberta-base/snapshots/b5b020e8117e1ddc6a0c7ed0fd22c0e679edf0fa/" On Windows, the base path is C:\Users\USERNAME\.cache\huggingface\transformers\ Warning: Using very large models from local paths may cause memory issues or crashes depending on your system's resources. |
Returns probabilities for the text classes
All processing is done locally with the downloaded model, and your text is never sent to any remote server or third-party.
Alexander P. Christensen <alexpaulchristensen@gmail.com>
# BART
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019).
Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
arXiv preprint arXiv:1910.13461.
# RoBERTa
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019).
Roberta: A robustly optimized bert pretraining approach.
arXiv preprint arXiv:1907.11692.
# Zero-shot classification
Yin, W., Hay, J., & Roth, D. (2019).
Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach.
arXiv preprint arXiv:1909.00161.
# MultiNLI dataset
Williams, A., Nangia, N., & Bowman, S. R. (2017).
A broad-coverage challenge corpus for sentence understanding through inference.
arXiv preprint arXiv:1704.05426.
# Load data
data(neo_ipip_extraversion)
# Example text
text <- neo_ipip_extraversion$friendliness[1:5]
## Not run:
# Cross-Encoder DistilRoBERTa
transformer_scores(
text = text,
classes = c(
"friendly", "gregarious", "assertive",
"active", "excitement", "cheerful"
)
)
# Facebook BART Large
transformer_scores(
text = text,
classes = c(
"friendly", "gregarious", "assertive",
"active", "excitement", "cheerful"
),
transformer = "facebook-bart"
)
# Directly from huggingface: typeform/distilbert-base-uncased-mnli
transformer_scores(
text = text,
classes = c(
"friendly", "gregarious", "assertive",
"active", "excitement", "cheerful"
),
transformer = "typeform/distilbert-base-uncased-mnli"
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.