Training predictions are the out-of-fold predictions on train data made by a model. That is, DataRobot can do 5-fold cross validation, where it trains on 80% of the train data and predicts for 20% of the train data. After doing this for each segment of the data, the five different 20% holdout sets can be recombined into a single file with a prediction for each row of the training data that was not made by a model that had trained on that row. This is important because predictions for rows that the model has trained on (in-fold predictions) will almost always overfit the data and not generalize well to new data. These training predictions are useful for further model validation and for blending the model with other models. Generating and retrieving these training predictions is now possible via the DataRobot API.
Before you can retrieve training predictions, you must first request their creation. This is done on the model object you want training predictions for.
dataSubset
specifies the subset of training data you want training predictions for, such as DataSubset$All
for all training data (note this will retrain your model at 100%), DataSubset$ValidationAndHoldout
will return predictions for solely data in validation and holdout sets, and DataSubset$Holdout
will return predictions solely for the holdout set.
models <- ListModels(projectId) model <- models[[1]] trainingPredictions <- GetTrainingPredictionsForModel(model, dataSubset = DataSubset$All) kable(head(trainingPredictions), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
library(knitr) trainingPredictions <- readRDS("trainingPredictions.rds") kable(head(trainingPredictions), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
You may also find it valuable to split a call to request and get like this:
models <- ListModels(projectId) model <- models[[1]] jobId <- RequestTrainingPredictions(model, dataSubset = DataSubset$All) # can run computations here while training predictions compute in the background trainingPredictions <- GetTrainingPredictionsFromJobId(projectId, jobId) # blocks until job complete kable(head(trainingPredictions), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
library(knitr) trainingPredictions <- readRDS("trainingPredictions.rds") kable(head(trainingPredictions), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
Or you can retrieve training predictions from a specific ID.
trainingPredictions <- ListTrainingPredictions(projectId) trainingPredictionId <- trainingPredictions[[1]]$id trainingPrediction <- GetTrainingPredictions(projectId, trainingPredictionId) kable(head(trainingPrediction), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
trainingPrediction <- readRDS("trainingPrediction.rds") kable(head(trainingPrediction), longtable = TRUE, booktabs = TRUE, row.names = TRUE)
You can also download training predictions to a CSV.
DownloadTrainingPredictions(projectId, trainingPredictionId, "trainingPredictions.csv")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.