If you're writing an R package that uses reticulate
as an interface to
a Python session, you likely also need one or more Python packages
installed on the user's machine for your package to work properly. In
addition, you'd likely prefer to spare users as much as possible from
details around how Python + reticulate
are configured. This vignette
documents a few approaches for accomplishing these goals.
Overall, the goal of an R package author using reticulate is to create a default experience that works reliably and doesn't require users to intervene or to have a sophisticated understanding of Python installation management. At the same time, it should also be easy to adjust the default behavior. There are two key questions to keep in mind:
Packages like tensorflow approach this
task by providing a helper function, tensorflow::install_tensorflow()
,
and documenting that users can call this function to prepare the
environment. For example:
library(tensorflow) install_tensorflow() # use tensorflow
As a best practice, an R package's Python dependencies should default to installing in an isolated virtual environment specifically designated for the R package. This minimizes the risk of inadvertently disrupting another Python installation on the user's system.
As an example, install_tensorflow()
takes an argument envname
with a
default value of "r-tensorflow"
. This default value ensures that
install_tensorflow()
will install into an environment named
"r-tensorflow"
, optionally creating it as needed.
The counterpart to the default behavior of install_tensorflow()
is the
work that happens in tensorflow::.onLoad()
, where the R package
expresses a preference, on behalf of the user, to use the r-tensorflow
environment if it exists. Inside the package, these two parts work
together to create a "pit of success":
install_tensorflow <- function(..., envname = "r-tensorflow") { reticulate::py_install("tensorflow", envname = envname, ...) } .onLoad <- function(...) { use_virtualenv("r-tensorflow", required = FALSE) }
The R package:
in .onLoad()
expresses to reticulate a soft preference for an
environment named "r-tensorflow", and
with install_tensorflow()
, provides a convenient way to make the
optional hint in .onLoad()
actionable, by actually creating the
"r-tensorflow" environment.
With this setup, the default experience is for the user to call
install_tensorflow()
once (creating a "r-tensorflow" environment).
Subsequently, calls to library(tensorflow)
will cause reticulate to
use the r-tensorflow
environment, and for everything to "just work".
The risk of disrupting another Python environment, or of this one being
disrupting, is minimal, since the environment is designated for the R
package. At the same time, if the environment is disrupted at some time
later (perhaps because something with conflicting Python dependencies
was manually installed), the user can easily revert to a working state
by calling install_tensorflow()
.
Python environments can occasionally get into a broken state when
conflicting package versions are installed, and the most reliable way to
get back to a working state is to delete the environment and start over
with a fresh one. For this reason, install_tensorflow()
removes any
pre-existing "r-tensorflow" Python environments first. Deleting a Python
environment however is not something to be done lightly, so the default
is to only delete the default "r-tensorflow" environment. Here is an
example of the helper install_tensorflow()
with the "reset" behavior.
#' @importFrom reticulate py_install virtualenv_exists virtualenv_remove install_tensorflow <- function(..., envname = "r-tensorflow", new_env = identical(envname, "r-tensorflow")) { if(new_env && virtualenv_exists(envname)) virtualenv_remove(envname) py_install(packages = "tensorflow", envname = envname, ...) }
One drawback of the isolated-package-environments approach is that if multiple R packages using reticulate are in use, then those packages won't all be able to use their preferred Python environment in the same R session (since there can only be one active Python environment at a time within an R session). To resolve this, users will have to take a slightly more active role in managing their Python environments. However, this can be as simple as supplying a unique environment name.
The most straightforward approach is for users to create a dedicated Python environment for a specific project. For example, a user can create a virtual environment in the project directory, like this:
envname <- "./venv" tensorflow::install_tensorflow(envname = envname) pysparklyr::install_pyspark(envname = envname)
As described in the Order of Python Discovery guide,
reticulate will automatically discover and use a Python virtual
environment in the current working directory like this. Alternatively,
if the environment exists outside the project directory, the user could
then place an .Renviron
or .Rprofile
file in the project directory,
ensuring that reticulate will use always use the Python environment
configured for that project. For example, an .Renviron
file in the
project directory could contain:
RETICULATE_PYTHON_ENV=~/my/project/venv
Or an .Rprofile
file in the project directory could contain:
Sys.setenv("RETICULATE_PYTHON_ENV" = "~/my/project/venv")
This approach minimizes the risk that an existing, already working, Python environment will accidentally be broken by installing packages, due to inadvertently upgrading or downgrading other Python packages already installed in the environment.
Another approach is for users to install your R packages' Python
dependencies into another Python environment that is already on the
search path. For example, users can opt-in to installing into the
default r-reticulate
venv:
tensorflow::install_tensorflow(envname = "r-reticulate")
Or they can install one package's dependencies into another package's
default environment. For example, installing spark into the default
"r-tensorflow"
environment:
tensorflow::install_tensorflow() # creates an "r-tensorflow" env pysparklyr::install_pyspark(envname = "r-tensorflow")
This approach---exporting an installation helper function that defaults
to a particular environment, and a hint in .onLoad()
to use that
environment---is one way to create a "pit of success". It encourages a
default workflow that is robust and reliable, especially for users
not yet familiar with the mechanics of Python installation management.
At the same time, an installation helper function empowers users to
manage Python environments through simply providing an environment name.
It makes it easy to combine dependencies of multiple R packages, and,
should anything go wrong due to conflicting Python dependencies, it also
provides a straightforward way to revert to a working state at any time,
by calling the helper function without arguments.
An alternative approach to the one described above is to do automatic
configuration. It's possible for client packages to declare their Python
dependencies in such a way that they are automatically installed in the
currently activated Python environment. This is a maximally convenient
approach; when it works it can feel a little bit magical, but it is also
potentially dangerous and can result in frustration if something goes
wrong. You can opt in to this behavior as a package author through your
packages DESCRIPTION
file, with the use of the Config/reticulate
field.
With automatic configuration, reticulate
envisions a world wherein
different R packages wrapping Python packages can live together in the
same Python environment / R session. This approach only works when the
Python packages being wrapped don't have conflicting dependencies.
You must be a judge of the Python dependencies your R package
requires--if automatically bootstrapping an installation of the Python
package into the user's active Python environment, whatever it may
contain, is a safe action to perform by default. For example, this is
most likely a safe action for a Python package like requests
, but
perhaps not a safe choice for a frequently updated package with many
dependencies, like torch
or tensorflow
(e.g., it's not uncommon for
torch
and tensorflow
to have conflicting version requirements for
dependencies like numpy
or cuda
). Keep in mind that, unlike CRAN,
PyPI does not perform any compatibility or consistency checks across the
package repository.
Config/reticulate
As a package author, you can opt in to automatic configuration like
this. For example, if we had a package rscipy
that acted as an
interface to the SciPy Python package, we might use
the following DESCRIPTION
file:
Package: rscipy Title: An R Interface to scipy Version: 1.0.0 Description: Provides an R interface to the Python package scipy. Config/reticulate: list( packages = list( list(package = "scipy") ) ) < ... other fields ... >
With this, reticulate
will take care of automatically configuring a
Python environment for the user when the rscipy
package is loaded and
used (i.e. it's no longer necessary to provide the user with a special
install_tensorflow()
-type function, though it's still recommended to
do so).
Specifically, after the rscipy
package is loaded, the following will
occur:
Unless the user has explicitly instructed reticulate
to use an
existing Python environment, reticulate
will prompt the user to
download and install
Miniconda (if
necessary).
After this, when the Python session is initialized by reticulate
,
all declared dependencies of loaded packages in Config/reticulate
will be discovered.
These dependencies will then be installed into an appropriate Conda environment, as provided by the Miniconda installation.
In this case, the end user workflow will be exactly as with an R package that has no Python dependencies:
library(rscipy) # use the package
If the user has no compatible version of Python available on their
system, they will be prompted to install Miniconda. If they do have
Python already, then the required Python packages (in this case scipy
)
will be installed in the standard shared environment for R sessions
(typically a virtual environment, or a Conda environment named
"r-reticulate").
In effect, users have to pay a one-time, mostly automated initialization
cost in order to use your package, and then things will work as any other
R package would. In particular, users are otherwise spared from details
about how reticulate
works.
.onLoad
ConfigurationIn some cases, a user may try to load your package after Python has
already been initialized. To ensure that reticulate
can still
configure the active Python environment, you can include the following
code:
.onLoad <- function(libname, pkgname) { reticulate::configure_environment(pkgname) }
This will instruct reticulate
to immediately try to configure the
active Python environment, installing any required Python packages as
necessary.
The goal of these mechanisms is to allow easy interoperability between R
packages that have Python dependencies, as well as to minimize
specialized version/configuration steps for end users. To that end,
reticulate
will (by default) track an older version of Python than the
current release, giving Python packages time to adapt. Python 2 will not
be supported.
Tools for breaking these rules are not yet implemented, but will be provided as the need arises.
Declared Python package dependencies should have the following format:
package: The name of the Python package.
version: The version of the package that should be installed.
When left unspecified, the latest available version will be
installed. This should only be set in exceptional cases---for
example, if the most recently-released version of a Python package
breaks compatibility with your package (or other Python packages) in
a fundamental way. If multiple R packages request different versions
of a particular Python package, reticulate
will signal a warning.
pip: Whether this package should be retrieved from the
PyPI using pip
. If FALSE
, it will be
downloaded from the Anaconda repositories instead.
For example, we could change the Config/reticulate
directive from
above to specify that scipy [1.3.0]
be installed from PyPI (with
pip
):
Config/reticulate: list( packages = list( list(package = "scipy", version = "1.3.0", pip = TRUE) ) )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.