In jtilly/r-conda-env: Deploy Python Models via R-Packages

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)

r-conda-env

R-Package = conda environment wrapped model

This package allows you to ship and deploy machine learning models built in Python using an R package.

Say you have a Python model that works in one specific conda environment and you want to make it accessible to R users via reticulate. How do you go about doing that?

This proof of concept R-package comes with a fully specified conda environment that will be created when the R package is installed. All Python code inside this package will then be run in this conda environment. We can ship several models in the same R package as long as they share their conda environment. If two models do not share their conda environment, we ship them in separate R packages.

Install

You need to have conda installed on your system and reticulate must be able to find it.

# install.packages("remotes")
remotes::install_github("jtilly/r-conda-env")

Usage

library(rcondaenv)
create_package_env()
df <- tibble::tribble(
  ~x, ~y, ~z,
  "a", 2, 3.6,
  "b", 1, 8.5
)
python_model_predict(df)
check_pandas_version()

Details

The conda requirements are defined in inst/conda-requirements.txt and installed with the R Package. python=3.8.2=he5300dc_5_cpython pandas=1.0.3=py38hcb8c335_0 numpy=1.18.1=py38h8854b6b_1 Package versions are currently pinned. There's an unpinned version for non-Linux systems.
Arbitrary Python code can be shipped with the package. Currently, there's only one file inst/model.py: ```python import pandas as pd

def predict(df): """Trivial predict function that returns a sequence 0, 1, ..., n-1.""" return df.reset_index(drop=True).index.astype(float)

def check_pandas_version(): return(f"The installed Pandas version is {pd.version}") `` - The reticulate calls are inR/predict.R. - We overcome the problem that you cannot use reticulate to interface with different Python executables within the same R session (see [this comment](https://github.com/rstudio/reticulate/issues/27#issuecomment-512256949)) by running the reticulate call on a different worker (via theparallelpackage - bothPSOCKandFORK` work here). This comes with overhead, both for setting up the cluster and for serializing the data and communicating with the worker, which may or may not be tolerable depending on your use case.

Performance

A benchmark is provided for a data set with 10 numerical columns, 10 string columns, and 10 date columns. encapsulate uses the little hack that allows us to use reticulate with different Python executables in the same R session. do_not_encapsulate goes straight from the user's R session to reticulate.

set_cluster_type("FORK")
results <- bench(n = 1e6)
knitr::kable(results[c("expression", "n", "median")])

jtilly/r-conda-env documentation built on April 19, 2020, 10:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jtilly/r-conda-env
Deploy Python Models via R-Packages

In jtilly/r-conda-env: Deploy Python Models via R-Packages

r-conda-env

Install

Usage

Details

Performance

R Package Documentation

Browse R Packages

We want your feedback!

jtilly/r-conda-env Deploy Python Models via R-Packages

In jtilly/r-conda-env: Deploy Python Models via R-Packages

r-conda-env

Install

Usage

Details

Performance

R Package Documentation

Browse R Packages

We want your feedback!

jtilly/r-conda-env
Deploy Python Models via R-Packages