An R wrapper for Python's pxtextmining
library- a pipeline to classify text-based patient experience data.
Function documentation: https://nhs-r-community.github.io/pxtextmineR/.
Package pxtextmineR
does not wrap everything from pxtextmining
, but
selected functions that will offer R users new opportunities for modelling. For
example, the whole Scikit-learn
(Pedregosa et al., 2011) text classification pipeline is wrapped, as
well as helper functions for e.g. sentiment analysis with Python's
textBlob
and
vaderSentiment
.
How does the wrapper work? It uses R package reticulate
,
which provides tools for interoperability between Python and R.
There are a few things that need to be done to install and set up pxtextmineR
.
devtools::install_github("nhs-r-community/pxtextmineR")
in the R
console.reticulate
has
functions to create a Python virtual environment via the R console. Refer to
reticulate::conda_create
and reticulate::virtualenv_create
. For example,
if using Conda, run reticulate::conda_create("r-reticulate")
where r-reticulate
is the name of reticulate
's default virtual environment.
Using this default virtual environment for pxtextmineR
is strongly
recommended because it makes the setup so much easier. According to the
reticulate
authors' own words
"[i]t’s much more straightforward for users if there is a common environment
used by R packages [...]"
1. Tell reticulate
to use the r-reticulate
virtual environment:
reticulate::use_condaenv("r-reticulate", required = TRUE)
1. Install Python package pxtextmining
in r-reticulate
:
reticulate::py_install(envname = "r-reticulate", packages = "pxtextmining", pip = TRUE)
1. We also need to install a couple of
spaCy
models in r-reticulate
.
These are obtained from URL links and thus need to be installed separately.
In the R console run:
``` system("pip install wheel") system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz")
system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz") ```
All steps in one go:
devtools::install_github("nhs-r-community/pxtextmineR")
# If not using Conda, comment out the next two lines and uncomment the two lines
# following them.
reticulate::conda_create("r-reticulate")
reticulate::use_condaenv("r-reticulate", required = TRUE)
# reticulate::virtualenv_create("r-reticulate")
# reticulate::use_virtualenv("r-reticulate", required = TRUE)
reticulate::py_install(envname = "r-reticulate", packages = "pxtextmining", pip = TRUE)
system("pip install wheel")
system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz")
system("pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz")
The installation instructions above did not work in all machines on which the installation process was tested. There were two problems:
reticulate
would simply
refuse to install in virtual environment r-reticulate
the version of
Scikit-learn
that pxtextmining
uses (v 0.23.2).r-reticulate
(i.e. reticulate::use_condaenv("<some_other_virtual_environment>", required = TRUE)
),
the behaviour of reticulate
was confusing. On the one hand, it would run
pxtextmineR
functions using the user-specified virtual environment. However,
on the other hand, when running commands to build e.g. function documentation
with R package pkgdown
, reticulate
would automatically set r-reticulate
as
the default environment, causing the code to break.We have opted for a more "invasive" approach to fix this problem so that users can use any virtual environment with no issues. This requires the following steps:
pxtextmining
and the spaCy
models:
```
pip install pxtextmining
pip install wheel
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-2.3.1/en_core_web_lg-2.3.1.tar.gz``
1. Use a text editor to open your
.Renvironfile, normally located in
~/.Renviron`, and add the following lines:
```
PXTEXTMINER_PYTHON_VENV_MANAGER=name_or_path_to_venv_manager
PXTEXTMINER_PYTHON_VENV=name_of_venv
```
where "name_of_venv" should be replaced by the name of the virtual
environment (unquoted) and "name_or_path_to_venv_manager" should be replaced
by the name of the virtual environment manager or the path to the virtual
environment (unquoted). In more detail:
- If using Conda or Miniconda, replace "name_or_path_to_venv_manager" with
"conda" or "miniconda" (unquoted) respectively.
- If using a Virtual Python Environment, replace
"name_or_path_to_venv_manager" with the path to the virtual environment,
e.g. `/home/user/venvs/myvenv`.
devtools::install_github("nhs-r-community/pxtextmineR")
in the R
console.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M. & Duchesnay E. (2011), Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825--2830.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.