Introduction to xform_function
In pmml: Generate PMML for Various Models

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

Introduction

This vignette provides examples of how to use the xform_function transformation to create new data features for PMML models.

Given a xform_wrap object and a transformation expression, xform_function calculates data for a new feature and creates a new xform_wrap object. When PMML is produced with pmml::pmml(), the transformation is inserted into the LocalTransformations node as a DerivedField.

Multiple data fields and functions can be combined to produce a new feature.

The code below uses knitr::kable() to make tables more readable.

library(pmml)
library(knitr)

Single numeric input

Using the iris dataset as an example, let's construct a new feature by transforming one variable. Load the dataset and show the first few lines:

data(iris)
kable(head(iris,3))

Create the iris_box object with xform_wrap:

iris_box <- xform_wrap(iris)

iris_box contains the data and transform information that will be used to produce PMML later. The original data is in iris_box$data. Any new features created with a transformation are added as columns to this data frame.

kable(head(iris_box$data,3))

Transform and field information is in iris_box$field_data. The field_data data frame contains information on every field in the dataset, as well as every transform used. The xform_function column contains expressions used in the xform_function transform.

kable(iris_box$field_data)

Now add a new feature, Sepal.Length.Sqrt, using xform_function:

iris_box <- xform_function(iris_box,orig_field_name="Sepal.Length",
                         new_field_name="Sepal.Length.Sqrt",
                         expression="sqrt(Sepal.Length)")

The new feature is calculated and added as a column to the iris_box$data data frame:

kable(head(iris_box$data,3))

iris_box$field_data now contains a new row with the transformation expression:

kable(iris_box$field_data[6,c(1:3,14)])

Construct a linear model for Petal.Width using this new feature, and convert it to PMML:

fit <- lm(Petal.Width ~ Sepal.Length.Sqrt, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)

Since the model predicts Petal.Width using a variable based on Sepal.Length, the PMML will contain these two fields in the DataDictionary and MiningSchema:

fit_pmml[[2]] #Data Dictionary node
fit_pmml[[3]][[1]] #Mining Schema node

The LocalTransformations node contains Sepal.Length.Sqrt as a derived field:

fit_pmml[[3]][[3]]

Single categorical input

xform_function can also operate on categorical data. In this example, let's create a numeric feature that equals 1 when Species is setosa, and 0 otherwise:

iris_box <- xform_wrap(iris)
iris_box <- xform_function(iris_box,orig_field_name="Species",
                         new_field_name="Species.Setosa",
                         expression="if (Species == 'setosa') {1} else {0}")
kable(head(iris_box$data,3))

Create a linear model and check the LocalTransformations node:

fit <- lm(Petal.Width ~ Species.Setosa, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)
fit_pmml[[3]][[3]]

Multiple input fields

Several fields can be combined to create new features. Let's make a new field from the ratio of sepal and petal lengths:

iris_box <- xform_wrap(iris)
iris_box <- xform_function(iris_box,orig_field_name="Sepal.Length,Petal.Length",
                         new_field_name="Length.Ratio",
                         expression="Sepal.Length / Petal.Length")

As before, the new field is added as a column to the iris_box$data data frame:

kable(head(iris_box$data,3))

Fit a linear model using this new feature, and convert it to pmml:

fit <- lm(Petal.Width ~ Length.Ratio, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)

The pmml will contain Sepal.Length and Petal.Length in the DataDictionary and MiningSchema:

fit_pmml[[2]] #Data Dictionary node
fit_pmml[[3]][[1]] #Mining Schema node

The Local.Transformations node contains Length.Ratio as a derived field:

fit_pmml[[3]][[3]]

Using a previously derived feature

It is possible to pass a feature derived with xform_function to another xform_function call. To do this, the second call to xform_function must use the original data field names (instead of the derived field) in the orig_field_name argument.

iris_box <- xform_wrap(iris)
iris_box <- xform_function(iris_box,orig_field_name="Sepal.Length,Petal.Length",
                         new_field_name="Length.Ratio",
                         expression="Sepal.Length / Petal.Length")

iris_box <- xform_function(iris_box,orig_field_name="Sepal.Length,Petal.Length,Sepal.Width",
                         new_field_name="Length.R.Times.S.Width",
                         expression="Length.Ratio * Sepal.Width")
kable(iris_box$field_data[6:7,c(1:3,14)])

fit <- lm(Petal.Width ~ Length.R.Times.S.Width, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)

The pmml will contain Sepal.Length, Petal.Length, and Sepal.Width in the DataDictionary and MiningSchema:

fit_pmml[[2]] #Data Dictionary node
fit_pmml[[3]][[1]] #Mining Schema node

The Local.Transformations node contains Length.Ratio and Length.R.Times.S.Width as derived fields:

fit_pmml[[3]][[3]]

Factor output

The resulting field can be numeric or factor. Note that factors are exported with dataType = "string" and optype = "categorical" in PMML. The following code creates a factor with 3 levels from Sepal.Length:

iris_box <- xform_wrap(iris)

iris_box <- xform_function(wrap_object = iris_box,
                              orig_field_name = "Sepal.Length",
                              new_field_name = "SL_factor",
                              new_field_data_type = "factor",
                              expression = "if(Sepal.Length<5.1) {'level_A'} else if (Sepal.Length>6.6) {'level_B'} else {'level_C'}")

kable(head(iris_box$data, 3))

The feature can then be used to create a model as usual:

fit <- lm(Petal.Width ~ SL_factor, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)

PMML functions supported by `xform_function`

The following R functions and operators are directly supported by xform_function. Their PMML equivalents are listed in the second column:

R <- c("+","-","/","*","^","<","<=",">",">=","&&","&","|","||","==","!=","!","ceiling","prod","log")
PMML <- c("+","-","/","*","pow","lessThan","lessOrEqual","greaterThan","greaterOrEqual","and","and","or","or","equal","notEqual","not","ceil","product","ln")

funcs_df <- data.frame(R, PMML)
knitr::kable(funcs_df)

For these functions, no extra code is required for translation.

The R function prod can be used as long as only numeric arguments are specified. That is, prod can take an na.rm argument, but specifying this in xform_function directly will not produce PMML equivalent to the R expression.

Similarly, the R function log can be used directly as long as the second argument (the base) is not specified.

PMML functions not supported by `xform_function`

There are built-in functions defined in PMML that cannot be directly translated to PMML using xform_function as described above.

In this case, an error will be thrown when R tries to calculate a new feature using the function passed to xform_function, but does not see that function in the environment.

It is still possible to make xform_function work, but the PMML function must be defined in the R environment first.

Let's use isIn, a PMML function, as an example. The function returns a boolean indicating whether the first argument is contained in a list of values. Detailed specification for this function is available on this DMG page.

One way to implement this in R is by using %in%, with the list of values being represented by ...:

isIn <- function(x, ...) {
  dots <- c(...)
  if (x %in% dots) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

isIn(1,2,1,4)

This function can now be passed to xform_function. The following code creates a feature that indicates whether Species is either setosa or versicolor:

iris_box <- xform_wrap(iris)
iris_box <- xform_function(iris_box,orig_field_name="Species",
                         new_field_name="Species.Setosa.or.Versicolor",
                         expression="isIn(Species,'setosa','versicolor')")

The data data frame now contains the new feature:

kable(head(iris_box$data,3))

Create a linear model and view the corresponding PMML for the function:

fit <- lm(Petal.Width ~ Species.Setosa.or.Versicolor, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)
fit_pmml[[3]][[3]]

PMML function not supported by `xform_function` - another example

As another example, let's use R's mean function to create a new feature. PMML has a built-in avg, so we will define an R function with this name.

avg <- function(...) {
  dots <- c(...)
  return(mean(dots))
}

Now use this function to take an average of several other features and combine with another field:

iris_box <- xform_wrap(iris)
iris_box <- xform_function(iris_box,orig_field_name="Sepal.Length,Petal.Length,Sepal.Width",
                         new_field_name="Length.Average.Ratio",
                         expression="avg(Sepal.Length,Petal.Length)/Sepal.Width")

The data data frame now contains the new feature:

kable(head(iris_box$data,3))

Create a simple linear model and view the corresponding PMML for the function:

fit <- lm(Petal.Width ~ Length.Average.Ratio, data=iris_box$data)
fit_pmml <- pmml(fit, transform=iris_box)
fit_pmml[[3]][[3]]

In the PMML, avg will be recognized as a valid function.

PMML for arbitrary functions

The function function_to_pmml (part of the pmml package) makes it possible to convert an R expression into PMML directly, without creating a model or calculating values.

As long as the expression passed to the function is a valid R expression (e.g., no unbalanced parentheses), it can contain arbitrary function names not defined in R. Variables in the expression passed to xform_function are always assumed to be field names, and not substituted. That is, even if x has a value in the R environment, the resulting expression will still use x.

function_to_pmml("1 + 2")

x <- 3
function_to_pmml("foo(bar(x * y))")

More notes on functions

There are several limitations to parsing expressions in xform_function.

Each transformation operates on one data row at a time. For example, it is not possible to compute the mean of an entire feature column in xform_function.

An expression such as foo(x) is treated as a function foo with argument x. Consequently, passing in an R vector c(1,2,3) will produce PMML where c is a function and 1,2,3 are the arguments:

function_to_pmml("c(1,2,3)")

We can also see what happens when passing an na.rm argument to prod, as mentioned in an above example:

function_to_pmml("prod(1,2,na.rm=FALSE)") #produces incorrect PMML
function_to_pmml("prod(1,2)") #produces correct PMML

Additionally, passing in a vector to prod produces incorrect PMML:

prod(c(1,2,3))
function_to_pmml("prod(c(1,2,3))")

More examples of functions

The following are additional examples of pmml produced from R expressions.

Extra parentheses:

function_to_pmml("pmmlT(((1+2))*(x))")

If-else expressions:

function_to_pmml("if(a<2) {x+3} else if (a>4) {4} else {5}")

References

DMG PMML 4.4 specification

Any scripts or data that you put into this service are public.

pmml documentation built on March 18, 2022, 5:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pmml
Generate PMML for Various Models

Introduction to xform_function
In pmml: Generate PMML for Various Models

Introduction

Single numeric input

Single categorical input

Multiple input fields

Using a previously derived feature

Factor output

PMML functions supported by `xform_function`

PMML functions not supported by `xform_function`

PMML function not supported by `xform_function` - another example

PMML for arbitrary functions

More notes on functions

More examples of functions

References

Try the pmml package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

pmml Generate PMML for Various Models

Introduction to xform_function In pmml: Generate PMML for Various Models

Introduction

Single numeric input

Single categorical input

Multiple input fields

Using a previously derived feature

Factor output

PMML functions supported by xform_function

PMML functions not supported by xform_function

PMML function not supported by xform_function - another example

PMML for arbitrary functions

More notes on functions

More examples of functions

References

Try the pmml package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

pmml
Generate PMML for Various Models

Introduction to xform_function
In pmml: Generate PMML for Various Models

PMML functions supported by `xform_function`

PMML functions not supported by `xform_function`

PMML function not supported by `xform_function` - another example