predict_pmml_batch: Get predictions for multiple input records from PMML model
In alex23lemm/zementisr: R client for the 'Zementis Server' API

Description Usage Arguments Details Value See Also Examples

View source: R/predict_pmml_batch.R

predict_pmml_batch() returns the predictions for multiple input records that are sent to Zementis Server. The values returned depend on the type of prediction model being executed on the server.

predict_pmml_batch(
  data,
  model_name,
  path = NULL,
  max_threads = NULL,
  max_records_per_thread = 5000,
  ...
)

`data`	Either a data frame or a path to a file that contain multiple data records that are sent to Zementis Server for prediction. Files must be `.csv` or `.json` files. Alternatively, `.csv` and `.json` files can also be sent in compressed format (`.zip` or `.gzip`). For compressed files you need to set the `path` argument.
`model_name`	The name of the deployed PMML model that gets predictions on the new data records contained in `data`.
`path`	Path to a file to which the response from Zementis Server is written to. Only mandatory if compressed input files (`.zip`) are passed to `data`.
`max_threads`	Maximum number of concurrent threads to process the data that is sent. Default value is twice the number of processor cores.
`max_records_per_thread`	Maximum number of records processed by a single thread. Default value is 5000.
`...`	Additional arguments passed on to the underlying HTTP method. This might be necessary if you need to set some curl options explicitly via `config`.

When calling predict_pmml_batch() data is sent to Zementis Server using octet streams. That means batch data is sent in stream mode and processing/scoring starts when the first chunk of streams hits the server. By default, the server will process records in a batch size of 5000 records per thread and there will be a maximum of 2*n threads to process the entire batch where n is the number of available cores on the machine.

Using the two function arguments max_threads and max_records_per_thread you can modify the compute resources on the server for your data processing needs. max_threads lets you reserve additional threads for your request (CPU resources). max_records_per_thread allows you to modify the number of records processed by a single thread (memory resources).

If data is a data frame, a .csv file or a .json file, a list with the following components:

model A length one character vector containing the model_name
outputs A data frame containing the prediction results for data

If data is a compressed file (.zip), a compressed .json file saved to path and an invisible 200 HTTP status code. If uncompressed and read into R, the file saved to path will be a list with the 2 components described above.

For regression models outputs will include a 1-column data frame with the predicted values.

For binary classification models outputs will include a 3-column data frame that includes the probability of class 0, the probability of class 1 and the classification class label result based on a 50% threshold.

upload_model, predict_pmml

## Not run: 
# Predict the entire iris data set
predict_pmml_batch(iris, "iris_model")

# Predict the entire iris data set previously saved to a .json file
jsonlite::write_json(iris, "iris.json")
predict_pmml_batch("iris.json", "iris_model")

# Predict the entire iris data set previously saved to a .csv file
write.csv(iris, "iris.csv", row.names = FALSE)
predict_pmml_batch("iris.csv","iris_model")

# Predict the entire iris data set previously saved and compressed
predict_pmml_batch("iris.csv.zip", "iris_model", "iris_predictions.zip")
unzipped_predictions <- unzip("iris_predictions.zip")
jsonlite::fromJSON(unzipped_predictions)

## End(Not run)