predict_pmml_batch: Get predictions for multiple input records from PMML model

Description Usage Arguments Details Value See Also Examples

View source: R/predict_pmml_batch.R

Description

predict_pmml_batch() returns the predictions for multiple input records that are sent to Zementis Server. The values returned depend on the type of prediction model being executed on the server.

Usage

1
2
3
4
5
6
7
8
predict_pmml_batch(
  data,
  model_name,
  path = NULL,
  max_threads = NULL,
  max_records_per_thread = 5000,
  ...
)

Arguments

data

Either a data frame or a path to a file that contain multiple data records that are sent to Zementis Server for prediction. Files must be .csv or .json files. Alternatively, .csv and .json files can also be sent in compressed format (.zip or .gzip). For compressed files you need to set the path argument.

model_name

The name of the deployed PMML model that gets predictions on the new data records contained in data.

path

Path to a file to which the response from Zementis Server is written to. Only mandatory if compressed input files (.zip) are passed to data.

max_threads

Maximum number of concurrent threads to process the data that is sent. Default value is twice the number of processor cores.

max_records_per_thread

Maximum number of records processed by a single thread. Default value is 5000.

...

Additional arguments passed on to the underlying HTTP method. This might be necessary if you need to set some curl options explicitly via config.

Details

When calling predict_pmml_batch() data is sent to Zementis Server using octet streams. That means batch data is sent in stream mode and processing/scoring starts when the first chunk of streams hits the server. By default, the server will process records in a batch size of 5000 records per thread and there will be a maximum of 2*n threads to process the entire batch where n is the number of available cores on the machine.

Using the two function arguments max_threads and max_records_per_thread you can modify the compute resources on the server for your data processing needs. max_threads lets you reserve additional threads for your request (CPU resources). max_records_per_thread allows you to modify the number of records processed by a single thread (memory resources).

Value

If data is a data frame, a .csv file or a .json file, a list with the following components:

If data is a compressed file (.zip), a compressed .json file saved to path and an invisible 200 HTTP status code. If uncompressed and read into R, the file saved to path will be a list with the 2 components described above.

For regression models outputs will include a 1-column data frame with the predicted values.

For binary classification models outputs will include a 3-column data frame that includes the probability of class 0, the probability of class 1 and the classification class label result based on a 50% threshold.

See Also

upload_model, predict_pmml

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
# Predict the entire iris data set
predict_pmml_batch(iris, "iris_model")

# Predict the entire iris data set previously saved to a .json file
jsonlite::write_json(iris, "iris.json")
predict_pmml_batch("iris.json", "iris_model")

# Predict the entire iris data set previously saved to a .csv file
write.csv(iris, "iris.csv", row.names = FALSE)
predict_pmml_batch("iris.csv","iris_model")

# Predict the entire iris data set previously saved and compressed
predict_pmml_batch("iris.csv.zip", "iris_model", "iris_predictions.zip")
unzipped_predictions <- unzip("iris_predictions.zip")
jsonlite::fromJSON(unzipped_predictions)

## End(Not run)

alex23lemm/zementisr documentation built on Jan. 9, 2020, 1:49 a.m.