README.md

Sparklyr2PMML

R library for converting Apache Spark ML pipelines to PMML.

Features

This package is a thin R wrapper for the JPMML-SparkML library.

Prerequisites

Installation

Install from GitHub using the devtools package:

library("devtools")

install_github("jpmml/sparklyr2pmml")

Configuration and usage

Sparklyr2PMML must be paired with JPMML-SparkML based on the following compatibility matrix:

Active development branches:

| Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version | |----------------------|----------------------|------------------------------| | 3.4.X | 3.0.X | 3.0.0 | | 3.5.X | master | 3.1.0 |

Stale development branches:

| Apache Spark version | JPMML-SparkML branch | Latest JPMML-SparkML version | |----------------------|----------------------|------------------------------| | 3.0.X | 2.0.X | 2.0.6 | | 3.1.X | 2.1.X | 2.1.6 | | 3.2.X | 2.2.X | 2.2.6 | | 3.3.X | 2.3.X | 2.3.5 | | 3.4.X | 2.4.X | 2.4.4 | | 3.5.X | 2.5.X | 2.5.3 |

Launch Sparklyr; use the sparklyr.connect.packages configuration option to specify the coordinates of relevant JPMML-SparkML modules:

Launching core:

library("sparklyr")

config = spark_config()
config[["sparklyr.connect.packages"]] = "org.jpmml:pmml-sparkml:${version}"

sc = spark_connect(master = "local", config = config)

Fitting a Spark ML pipeline:

library("dplyr")
library("sparklyr")

data(iris)

iris_df = copy_to(sc, iris)

iris_pipeline = ml_pipeline(sc) %>%
    ft_r_formula(Species ~ .) %>%
    ml_decision_tree_classifier()

iris_pipeline_model = ml_fit(iris_pipeline, iris_df)

Exporting the fitted Spark ML pipeline to a PMML file:

library("sparklyr2pmml")

pmmlBuilder = PMMLBuilder(sc, iris_df, iris_pipeline_model)

buildFile(pmmlBuilder, "DecisionTreeIris.pmml")

License

Sparklyr2PMML is licensed under the terms and conditions of the GNU Affero General Public License, Version 3.0.

If you would like to use Sparklyr2PMML in a proprietary software project, then it is possible to enter into a licensing agreement which makes Sparklyr2PMML available under the terms and conditions of the BSD 3-Clause License instead.

Additional information

Sparklyr2PMML is developed and maintained by Openscoring Ltd, Estonia.

Interested in using Java PMML API software in your company? Please contact info@openscoring.io



jpmml/sparklyr2pmml documentation built on Feb. 25, 2025, 4:25 a.m.