knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Text enables users access to HuggingFace Transformers in R through the R-package reticulate as an interface to Python, and the python packages torch and transformers. So it's important to install both the text-package and a python environment with the text required python packages that the text-package can use. The recommended way is to use textrpp_install()
to install a conda environment with text required python packages, and textrpp_initialize
to initialize it.
library(text) library(reticulate) # Install text required python packages in a conda environment (with defaults). text::textrpp_install() # Show available conda environments. reticulate::conda_list() # Initialize the installed conda environment. # save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R. text::textrpp_initialize(save_profile = TRUE) # Test so that the text package work. textEmbed("hello")
Recently some text users (mainly on Mac), have experienced OMP errors - and that RStudio and R crashes. When this is happening we have found the following solutions for now:
Sys.setenv(OMP_NUM_THREADS = "1") #Limit the number of threads to prevent conflicts. Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1") # Also might have to restart R .rs.restartR() # If above does not work, you can also try this; although this solution might have some risks assocaited with it (for more information see https://github.com/dmlc/xgboost/issues/1715) Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE") #Temporarily allows execution despite duplicate OpenMP libraries. ### This is how you can unset the settings Sys.unsetenv("OMP_NUM_THREADS") Sys.unsetenv("OMP_MAX_ACTIVE_LEVELS") Sys.unsetenv("KMP_DUPLICATE_LIB_OK") # This is how you can verify the settings print(Sys.getenv("DYLD_LIBRARY_PATH")) # Please let us know if you find any other solutions.
if running: textrpp_install()
results in this error:
Failed to build tokenizers ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
In the terminal run:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Error:
"Error: Error installing package(s): ..." including: "error: can't find Rust compiler"
In the terminal run:
brew install rust
The success of the installation is dependent on using conda, python and package versions that work together. The installation of the text-package with text required python packages is tested on Linux, Mac OS, and Windows using github actions. The installation procedure and details can be seen at github actions (look at workflow runs called System specific installation NoPy).
The table below show various combination of python and package versions that have worked (it is not an exhaustive list).
library(magrittr) os <- c("'Mac OS'", "'Linux'", "'Windows'", "'Windows'", "'Mac OS'", "'Linux'", "'Windows'", "'Mac OS'", "'Linux'", "'Windows'", "'Mac OS'", "'Linux'", "'Windows'") mini_conda <- c("'-'", "'-'", "'-'","'4.10.1'", "'4.10.3'", "'4.10.3'", "'4.10.3'", "'4.10.3'", "'4.10.3'", "'4.10.3'", "'4.10.3'", "'4.10.3'", "'4.10.3'") python <- c("'3.9.0'", "'3.9.0'", "'3.9.0'", "'3.9.0'", "'3.9.0'", "'3.9.0'", "'3.9.0'", "'3.8.10'", "'3.8.10'", "'3.8.10'", "'3.7.0'", "'3.7.0'", "'3.6.13'" ) torch <- c("'torch==1.11.0'", "'torch==1.11.0'", "'torch==1.11.0'","'torch==1.7.1'", "'torch==1.7.1'", "'torch==1.7.1'", "'torch==1.7.1'", "'torch==1.7.1'", "'torch==1.7.1'", "'torch==1.7.1'", "'torch==0.4.1'", "'torch==0.4.1'", "'torch==1.10'") transformers <- c("'transformers==4.19.2'", "'transformers==4.19.2'", "'transformers==4.19.2'", "'transformers==4.12.5'", "'transformers==4.12.5'", "'transformers==4.12.5'", "'transformers==4.12.5'", "'transformers==4.12.5'", "'transformers==4.12.5'", "'transformers==4.12.5'", "'transformers==3.3.1'", "'transformers==3.3.1'", "'transformers==3.3.1'") success <- c("Pass", "Pass","Pass", "FAIL", "Pass", "Pass","Pass", "Pass","Pass","Pass", "Pass","Pass","Pass") mini_conda_table <- tibble::tibble(os, mini_conda, python, torch, transformers, success) knitr::kable(mini_conda_table, caption="", bootstrap_options = c("hover"), full_width = T)
It is also possible to use virtual environments (although it is currently only tested on MacOS).
# Create a virtual environment with text required python packages. # Note that you have to provide a python path. text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"), python_path = "/usr/local/bin/python3.9", envname = "textrpp_virtualenv") # Initialize the virtual environment. text::textrpp_initialize(virtualenv = "textrpp_virtualenv", condaenv = NULL, save_profile = TRUE)
Virtual environments works for MacOS, whereas github actions does not currently work for Linux and Windows. At gihub actions look for a workflow run called: Virtual environment for more information.
library(magrittr) OS <- c("'Mac OS'", "'Linux'", "'Mac OS'", "'Linux'", "'Windows'") Python_version <- c("'3.9.8'", "'3.9.8'", "'3.9.8'", "-", "-") torch <- c("'torch==1.11.0'", "'torch==1.11.0'", "'torch==1.7.1'", "-", "-") transformers <- c("'transformers==4.19.2'", "'transformers==4.19.2'", "'transformers==4.12.5'", "-", "-") Success <- c("Pass", "Pass", "Pass", "-", "-") venv_conda_table <- tibble::tibble(OS, Python_version, torch, transformers, Success) knitr::kable(venv_conda_table, caption="", bootstrap_options = c("hover"), full_width = T)
Below is the instructions for installing earlier versions of text (0.9.10 and before); these should work for newer versions of text as long as a correct versions of python and required packages are used.
library(text) # To install the python packages torch, transformers, numpy and nltk through R, run: library(reticulate) install_miniconda() conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE) # Windows 10 conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'))
If something isn't working right, it is a good start to examine what is installed and running on your system. For example to make sure that you have R and Python versions that are up to date.
# First check R-version and which packages that are attached and loaded. sessionInfo() # Second check out python version; and make sure you at least have version 3.6.10 library(reticulate) py_config()
After a new install/update of text, RStudio crashed (Abort session) when running functions that fetches word embeddings (i.e., textEmbedLayersOutput
or textEmbed
).
To solve the issue re-install reticulate (development version) and uninstall and install r-miniconda.
Uninstall r-miniconda by removing its entire folder (which by default [in Mac] is at Users/YOUR_USER_NAME/Library/r-miniconda
).
(Note that [in Mac] the Library folder is hidden, so to make it visible go to Finder and the path Users/YOUR_USER_NAME/ and press the three keys: COMMAND + SHIFT + .
. Then the Library-folder should appear, and you can find and remove r-miniconda.
library(text) # To re-install packages start with a fresh session by restarting R and RStudio # Install development of reticulate (might not be necessary) devtools::install_github("rstudio/reticulate") # After having manually removed the r-miniconda folder, install it again: library(reticulate) install_miniconda() # Subsequently re-install torch, transformers, numpy and nltk by running: conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)
The exact way to install these packages may differ across systems. Please see:\ Python\ torch\ transformers
If you find a good solution please feel free to email oscar [ d_o t] kjell [a_t] psy [DOT] lu [d_o_t]se so that we can update above instructions.
e368e8b (documentation updates)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.