AutoCatBoostScoring: AutoCatBoostScoring
In AdrianAntico/RemixAutoML: AutoQuant

AutoCatBoostScoring

R Documentation

AutoCatBoostScoring

Description

AutoCatBoostScoring is an automated scoring function that compliments the AutoCatBoost model training functions. This function requires you to supply features for scoring. It will run ModelDataPrep() to prepare your features for catboost data conversion and scoring.

Usage

AutoCatBoostScoring(
  TargetType = NULL,
  ScoringData = NULL,
  FeatureColumnNames = NULL,
  FactorLevelsList = NULL,
  IDcols = NULL,
  OneHot = FALSE,
  ReturnShapValues = FALSE,
  ModelObject = NULL,
  ModelPath = NULL,
  ModelID = NULL,
  ReturnFeatures = TRUE,
  MultiClassTargetLevels = NULL,
  TransformNumeric = FALSE,
  BackTransNumeric = FALSE,
  TargetColumnName = NULL,
  TransformationObject = NULL,
  TransID = NULL,
  TransPath = NULL,
  MDP_Impute = FALSE,
  MDP_CharToFactor = FALSE,
  MDP_RemoveDates = FALSE,
  MDP_MissFactor = "0",
  MDP_MissNum = -1,
  RemoveModel = FALSE,
  Debug = FALSE
)

Arguments

`TargetType`	Set this value to 'regression', 'classification', 'multiclass', or 'multiregression' to score models built using AutoCatBoostRegression(), AutoCatBoostClassifier() or AutoCatBoostMultiClass().
`ScoringData`	This is your data.table of features for scoring. Can be a single row or batch.
`FeatureColumnNames`	Supply either column names or column numbers used in the AutoCatBoostRegression() function
`FactorLevelsList`	List of factors levels to CharacterEncode()
`IDcols`	Supply ID column numbers for any metadata you want returned with your predicted values
`OneHot`	Passsed to DummifyD
`ReturnShapValues`	Set to TRUE to return a data.table of feature contributions to all predicted values generated
`ModelObject`	Supply the model object directly for scoring instead of loading it from file. If you supply this, ModelID and ModelPath will be ignored.
`ModelPath`	Supply your path file used in the AutoCatBoost__() function
`ModelID`	Supply the model ID used in the AutoCatBoost__() function
`ReturnFeatures`	Set to TRUE to return your features with the predicted values.
`MultiClassTargetLevels`	For use with AutoCatBoostMultiClass(). If you saved model objects then this scoring function will locate the target levels file. If you did not save model objects, you can supply the target levels returned from AutoCatBoostMultiClass().
`TransformNumeric`	Set to TRUE if you have features that were transformed automatically from an Auto__Regression() model AND you haven't already transformed them.
`BackTransNumeric`	Set to TRUE to generate back-transformed predicted values. Also, if you return features, those will also be back-transformed.
`TargetColumnName`	Input your target column name used in training if you are utilizing the transformation service
`TransformationObject`	Set to NULL if you didn't use transformations or if you want the function to pull from the file output from the Auto__Regression() function. You can also supply the transformation data.table object with the transformation details versus having it pulled from file.
`TransID`	Set to the ID used for saving the transformation data.table object or set it to the ModelID if you are pulling from file from a build with Auto__Regression().
`TransPath`	Set the path file to the folder where your transformation data.table detail object is stored. If you used the Auto__Regression() to build, set it to the same path as ModelPath.
`MDP_Impute`	Set to TRUE if you did so for modeling and didn't do so before supplying ScoringData in this function
`MDP_CharToFactor`	Set to TRUE to turn your character columns to factors if you didn't do so to your ScoringData that you are supplying to this function
`MDP_RemoveDates`	Set to TRUE if you have date of timestamp columns in your ScoringData
`MDP_MissFactor`	If you set MDP_Impute to TRUE, supply the character values to replace missing values with
`MDP_MissNum`	If you set MDP_Impute to TRUE, supply a numeric value to replace missing values with
`RemoveModel`	Set to TRUE if you want the model removed immediately after scoring
`Debug`	= FALSE

Value

A data.table of predicted values with the option to return model features as well.

Author(s)

Adrian Antico

Examples

## Not run: 

# CatBoost Regression Example

# Create some dummy correlated data
data <- AutoQuant::FakeDataGenerator(
  Correlation = 0.85,
  N = 10000,
  ID = 2,
  ZIP = 0,
  AddDate = FALSE,
  Classification = FALSE,
  MultiClass = FALSE)

# Copy data
data1 <- data.table::copy(data)

# Run function
TestModel <- AutoQuant::AutoCatBoostRegression(

  # GPU or CPU and the number of available GPUs
  TrainOnFull = FALSE,
  task_type = 'CPU',
  NumGPUs = 1,
  DebugMode = FALSE,

  # Metadata args
  OutputSelection = c('Importances','EvalPlots','EvalMetrics','Score_TrainData'),
  ModelID = 'Test_Model_1',
  model_path = getwd(),
  metadata_path = getwd(),
  SaveModelObjects = FALSE,
  SaveInfoToPDF = FALSE,
  ReturnModelObjects = TRUE,

  # Data args
  data = data1,
  ValidationData = NULL,
  TestData = NULL,
  TargetColumnName = 'Adrian',
  FeatureColNames = names(data1)[!names(data1) %in% c('IDcol_1', 'IDcol_2','Adrian')],
  PrimaryDateColumn = NULL,
  WeightsColumnName = NULL,
  IDcols = c('IDcol_1','IDcol_2'),
  TransformNumericColumns = 'Adrian',
  Methods = c('Asinh','Asin','Log','LogPlus1','Sqrt','Logit'),

  # Model evaluation
  eval_metric = 'RMSE',
  eval_metric_value = 1.5,
  loss_function = 'RMSE',
  loss_function_value = 1.5,
  MetricPeriods = 10L,
  NumOfParDepPlots = ncol(data1)-1L-2L,

  # Grid tuning args
  PassInGrid = NULL,
  GridTune = FALSE,
  MaxModelsInGrid = 30L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 60*60,
  BaselineComparison = 'default',

  # ML args
  langevin = FALSE,
  diffusion_temperature = 10000,
  Trees = 1000,
  Depth = 9,
  L2_Leaf_Reg = NULL,
  RandomStrength = 1,
  BorderCount = 128,
  LearningRate = NULL,
  RSM = 1,
  BootStrapType = NULL,
  GrowPolicy = 'SymmetricTree',
  model_size_reg = 0.5,
  feature_border_type = 'GreedyLogSum',
  sampling_unit = 'Object',
  subsample = NULL,
  score_function = 'Cosine',
  min_data_in_leaf = 1)

# Trained Model Object
TestModel$Model

# Train Data (includes validation data) and Test Data with predictions and shap values
TestModel$TrainData
TestModel$TestData

# Calibration Plots
TestModel$PlotList$Train_EvaluationPlot
TestModel$PlotList$Test_EvaluationPlot

# Calibration Box Plots
TestModel$PlotList$Train_EvaluationBoxPlot
TestModel$PlotList$Test_EvaluationBoxPlot

# Residual Analysis Plots
TestModel$PlotList$Train_ResidualsHistogram
TestModel$PlotList$Test_ResidualsHistogram

# Preds vs Actuals Scatterplots
TestModel$PlotList$Train_ScatterPlot
TestModel$PlotList$Test_ScatterPlot

# Preds vs Actuals Copula Plot
TestModel$PlotList$Train_CopulaPlot
TestModel$PlotList$Test_CopulaPlot

# Variable Importance Plots
TestModel$PlotList$Train_VariableImportance
TestModel$PlotList$Validation_VariableImportance
TestModel$PlotList$Test_VariableImportance

# Evaluation Metrics
TestModel$EvaluationMetrics$TrainData
TestModel$EvaluationMetrics$TestData

# Variable Importance Tables
TestModel$VariableImportance$Train_Importance
TestModel$VariableImportance$Validation_Importance
TestModel$VariableImportance$Test_Importance

# Interaction Importance
TestModel$InteractionImportance$Train_Interaction
TestModel$InteractionImportance$Validation_Interaction
TestModel$InteractionImportance$Test_Interaction

# Meta Data
TestModel$ColNames
TestModel$TransformationResults
TestModel$GridList

# Score data
Preds <- AutoQuant::AutoCatBoostScoring(
  TargetType = 'regression',
  ScoringData = data,
  FeatureColumnNames = names(data)[!names(data) %in% c('IDcol_1', 'IDcol_2','Adrian')],
  FactorLevelsList = TestModel$FactorLevelsList,
  IDcols = c('IDcol_1','IDcol_2'),
  OneHot = FALSE,
  ReturnShapValues = TRUE,
  ModelObject = TestModel$Model,
  ModelPath = NULL,
  ModelID = 'Test_Model_1',
  ReturnFeatures = TRUE,
  MultiClassTargetLevels = NULL,
  TransformNumeric = FALSE,
  BackTransNumeric = FALSE,
  TargetColumnName = NULL,
  TransformationObject = NULL,
  TransID = NULL,
  TransPath = NULL,
  MDP_Impute = TRUE,
  MDP_CharToFactor = TRUE,
  MDP_RemoveDates = TRUE,
  MDP_MissFactor = '0',
  MDP_MissNum = -1,
  RemoveModel = FALSE)

  # Step through scoring function
  library(AutoQuant)
  library(data.table)
  TargetType = 'regression'
  ScoringData = data
  FeatureColumnNames = names(data)[!names(data) %in% c('IDcol_1', 'IDcol_2','Adrian')]
  FactorLevelsList = TestModel$FactorLevelsList
  IDcols = c('IDcol_1','IDcol_2')
  OneHot = FALSE
  ReturnShapValues = TRUE
  ModelObject = TestModel$Model
  ModelPath = NULL
  ModelID = 'Test_Model_1'
  ReturnFeatures = TRUE
  MultiClassTargetLevels = NULL
  TransformNumeric = FALSE
  BackTransNumeric = FALSE
  TargetColumnName = NULL
  TransformationObject = NULL
  TransID = NULL
  TransPath = NULL
  MDP_Impute = TRUE
  MDP_CharToFactor = TRUE
  MDP_RemoveDates = TRUE
  MDP_MissFactor = '0'
  MDP_MissNum = -1
  RemoveModel = FALSE
  Debug = TRUE

## End(Not run)

AdrianAntico/RemixAutoML documentation built on June 12, 2025, 5:35 p.m.