pmml.svm: Generate the PMML representation of an svm object from the...

View source: R/pmml.svm.R

pmml.svmR Documentation

Generate the PMML representation of an svm object from the e1071 package.

Description

Generate the PMML representation of an svm object from the e1071 package.

Usage

## S3 method for class 'svm'
pmml(
  model,
  model_name = "LIBSVM_Model",
  app_name = "SoftwareAG PMML Generator",
  description = "Support Vector Machine Model",
  copyright = NULL,
  model_version = NULL,
  transforms = NULL,
  missing_value_replacement = NULL,
  dataset = NULL,
  detect_anomaly = TRUE,
  ...
)

Arguments

model

An svm object from package e1071.

model_name

A name to be given to the PMML model.

app_name

The name of the application that generated the PMML.

description

A descriptive text for the Header element of the PMML.

copyright

The copyright notice for the model.

model_version

A string specifying the model version.

transforms

Data transformations.

missing_value_replacement

Value to be used as the 'missingValueReplacement' attribute for all MiningFields.

dataset

Required for one-classification only; data used to train the one-class SVM model.

detect_anomaly

Required for one-classification only; boolean indicating whether to detect anomalies (TRUE) or inliers (FALSE).

...

Further arguments passed to or from other methods.

Details

Classification and regression models are represented in the PMML SupportVectorMachineModel format. One-Classification models are represented in the PMML AnomalyDetectionModel format. Please see below for details on the differences.

Value

PMML representation of the svm object.

Classification and Regression Models

Note that the sign of the coefficient of each support vector flips between the R object and the exported PMML file for classification and regression models. This is due to the minor difference in the training/scoring formula between the LIBSVM algorithm and the DMG specification. Hence the output value of each support vector machine has a sign flip between the DMG definition and the svm prediction function.

In a classification model, even though the output of the support vector machine has a sign flip, it does not affect the final predicted category. This is because in the DMG definition, the winning category is defined as the left side of threshold 0 while the LIBSVM defines the winning category as the right side of threshold 0.

For a regression model, the exported PMML code has two OutputField elements. The OutputField predictedValue shows the support vector machine output per DMG definition. The OutputField svm_predict_function gives the value corresponding to the R predict function for the svm model. This output should be used when making model predictions.

One-Classification SVM Models

For a one-classification svm (OCSVM) model, the PMML has two OutputField elements: anomalyScore and one of anomaly or outlier.

The OutputField anomalyScore is the signed distance to the separating boundary; anomalyScore corresponds to the decision.values attribute of the output of the svm predict function in R.

The second OutputField depends the value of detect_anomaly. By default, detect_anomaly is TRUE, which results in the second OutputField being anomaly. The anomaly OutputField is TRUE when an anomaly is detected. This field conforms to the DMG definition of an anomaly detection model. This value is the opposite of the prediction by the e1071::svm object in R.

Setting detect_anomaly to FALSE results in the second field instead being inlier. This OutputField is TRUE when an inlier is detected, and conforms to the e1071 definition of one-class SVMs. This field is FALSE when an anomaly is detected; that is, the R svm model predicts whether an observation belongs to the class. When comparing the predictions from R and PMML, this field should be used, since it will match R's output.

For example, say that for an an observation, the R OCSVM model predicts a positive decision value of 0.4 and label of TRUE. According to the R object, this means that the observation is an inlier. By default, the PMML export of this model will give the following for the same input: anomalyScore = 0.4, anomaly = "false". According to the PMML, the observation is not an anomaly. If the same R object is instead exported with detect_anomaly = FALSE, the PMML will then give: anomalyScore = 0.4, inlier = "true", and this result agrees with R.

Note that there is no sign flip for anomalyScore between R and PMML for OCSVM models.

To export a OCSVM model, an additional argument, dataset, is required by the function. This argument expects a dataframe with data that was used to train the model. This is necessary because for one-class svm, the R svm object does not contain information about the data types of the features used to train the model. The exporter does not yet support the formula interface for one-classification models, so the default S3 method must be used to train the SVM. The data used to train the one-class SVM must be numeric and not of integer class.

References

* R project CRAN package: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien https://CRAN.R-project.org/package=e1071

* Chang, Chih-Chung and Lin, Chih-Jen, LIBSVM: a library for Support Vector Machines https://www.csie.ntu.edu.tw/~cjlin/libsvm/

See Also

pmml, PMML SVM specification

Examples

## Not run: 
library(e1071)
data(iris)

# Classification with a polynomial kernel
fit <- svm(Species ~ ., data = iris, kernel = "polynomial")
fit_pmml <- pmml(fit)

# Regression
fit <- svm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, data = iris)
fit_pmml <- pmml(fit)

# Anomaly detection with one-classification
fit <- svm(iris[, 1:4],
  y = NULL,
  type = "one-classification"
)
fit_pmml <- pmml(fit, dataset = iris[, 1:4])

# Inlier detection with one-classification
fit <- svm(iris[, 1:4],
  y = NULL,
  type = "one-classification",
  detect_anomaly = FALSE
)
fit_pmml <- pmml(fit, dataset = iris[, 1:4])

## End(Not run)


pmml documentation built on March 18, 2022, 5:49 p.m.