README.md

r-conda-env

Travis build
status

R-Package = conda environment wrapped model

This package allows you to ship and deploy machine learning models built in Python using an R package.

Say you have a Python model that works in one specific conda environment and you want to make it accessible to R users via reticulate. How do you go about doing that?

This proof of concept R-package comes with a fully specified conda environment that will be created when the R package is installed. All Python code inside this package will then be run in this conda environment. We can ship several models in the same R package as long as they share their conda environment. If two models do not share their conda environment, we ship them in separate R packages.

Install

You need to have conda installed on your system and reticulate must be able to find it.

# install.packages("remotes")
remotes::install_github("jtilly/r-conda-env")

Usage

library(rcondaenv)
create_package_env()
#> Creating conda environment now.
#> Environment 2f0409c2f60c564607d28c44c8edc52c already exists. Removing it first...
#> Created conda environment 2f0409c2f60c564607d28c44c8edc52c
df <- tibble::tribble(
  ~x, ~y, ~z,
  "a", 2, 3.6,
  "b", 1, 8.5
)
python_model_predict(df)
#> [1] 0 1
check_pandas_version()
#> [1] "The installed Pandas version is 1.0.3"

Details

Performance

A benchmark is provided for a data set with 10 numerical columns, 10 string columns, and 10 date columns. encapsulate uses the little hack that allows us to use reticulate with different Python executables in the same R session. do_not_encapsulate goes straight from the user’s R session to reticulate.

set_cluster_type("FORK")
results <- bench(n = 1e6)
#> Running with:
#>         n
#> 1       1
#> 2      10
#> 3     100
#> 4    1000
#> 5   10000
#> 6  100000
#> 7 1000000
knitr::kable(results[c("expression", "n", "median")])

| expression | n | median | | :----------------------- | ----: | -------: | | encapsulate(df) | 1e+00 | 542.29ms | | do_not_encapsulate(df) | 1e+00 | 459.25ms | | encapsulate(df) | 1e+01 | 510.14ms | | do_not_encapsulate(df) | 1e+01 | 465.09ms | | encapsulate(df) | 1e+02 | 500.33ms | | do_not_encapsulate(df) | 1e+02 | 464.31ms | | encapsulate(df) | 1e+03 | 566.11ms | | do_not_encapsulate(df) | 1e+03 | 470.98ms | | encapsulate(df) | 1e+04 | 614.53ms | | do_not_encapsulate(df) | 1e+04 | 532.51ms | | encapsulate(df) | 1e+05 | 1.91s | | do_not_encapsulate(df) | 1e+05 | 1.17s | | encapsulate(df) | 1e+06 | 12.01s | | do_not_encapsulate(df) | 1e+06 | 7.52s |



jtilly/r-conda-env documentation built on April 19, 2020, 10:23 p.m.