knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Data pipelines in {rixpress} often require controlling how objects are stored
and restored, especially when dealing with:
{qs} compressed files, etc.).This vignette focuses on encoding and decoding in R, and on transferring
data between R and Python using rxp_py2r() and rxp_r2py().
By default, {rixpress} uses saveRDS() and readRDS(). You can override this
to handle different formats or complex objects:
library(rixpress) # Encode output as CSV instead of RDS d2 <- rxp_r( mtcars_head, my_head(mtcars_am, 100), user_functions = "my_head.R", nix_env = "default.nix", encoder = write.csv ) # Encode as qs, decode input from CSV d3 <- rxp_r( mtcars_tail, my_tail(mtcars_head), user_functions = "my_tail.R", nix_env = "default2.nix", encoder = qs::qsave, decoder = read.csv ) # Decode multiple upstream objects with different decoders d4 <- rxp_r( mtcars_mpg, full_join(mtcars_tail, mtcars_head), nix_env = "default2.nix", decoder = c( mtcars_tail = "qs::qread", mtcars_head = "read.csv" ) )
Key points:
encoder controls how this step’s output is stored.decoder specifies how to read inputs from upstream derivations.As shown in the examples above, you can pass a function or a string
representation of the function to encoder and decoder.
By encoding the object in a cross-language format, it is possible to pass it to another language. For example, read a csv file using Julia, encode it to Arrow and read it back in R:
library(rixpress) list( rxp_jl_file( mtcars, # Assume here that mtcars.csv is separated by "|" instead of "," path = "data/mtcars.csv", read_function = "read_csv", user_functions = "functions.jl", encoder = "write_arrow" # read_csv and write_arrow are both # defined in the functions.jl script # and looks like this: #function write_arrow(df::DataFrame, filename::String) # Arrow.write(filename, df) #end #function read_csv(path::String) # df = CSV.read(path, DataFrame; delim="|") #return df #end ), rxp_r( mtcars2, select(mtcars, am, cyl, mpg), decoder = "read_feather" ) ) |> rxp_populate()
You can find this example here. You can use the same approach to transfer data to Python (well, from and to any of the three supported languages).
In the specific case of transferring objects (data, lists, vectors, arrays, etc.)
between R and Python, it also possible to use {reticulate}'s built-in
conversion by using rxp_py2r() and rxp_r2py(). These functions enable
seamless movement of objects between R and Python:
library(rixpress) # Python step producing pandas DataFrame d1 <- rxp_py( name = mtcars_pl_am, expr = "mtcars_pl.filter(polars.col('am') == 1).to_pandas()" ) # Transfer Python -> R d2 <- rxp_py2r( name = mtcars_am, expr = mtcars_pl_am ) # R step processing the data d3 <- rxp_r( name = mtcars_head, expr = my_head(mtcars_am), user_functions = "functions.R" ) # Transfer R -> Python d3_1 <- rxp_r2py( name = mtcars_head_py, expr = mtcars_head )
For this to work, you need to add {reticulate} to the pipeline's execution environment.
encoder/decoder for non-RDS objects (CSV, {qs}, Keras models) and to pass data
to and from different languages.rxp_py2r() and rxp_r2py() if you want to re-use {reticulate}'s bulit-in conversion
(useful for more complex objects).Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.