AutoLightGBMCARMA: AutoLightGBMCARMA
In AdrianAntico/ModelingTools: AutoQuant

AutoLightGBMCARMA

R Documentation

AutoLightGBMCARMA

Description

AutoLightGBMCARMA Mutlivariate Forecasting with calendar variables, Holiday counts, holiday lags, holiday moving averages, differencing, transformations, interaction-based categorical encoding using target variable and features to generate various time-based aggregated lags, moving averages, moving standard deviations, moving skewness, moving kurtosis, moving quantiles, parallelized interaction-based fourier pairs by grouping variables, and Trend Variables.

Usage

AutoLightGBMCARMA(
  data = NULL,
  XREGS = NULL,
  TimeWeights = NULL,
  NonNegativePred = FALSE,
  RoundPreds = FALSE,
  TrainOnFull = FALSE,
  TargetColumnName = NULL,
  DateColumnName = NULL,
  HierarchGroups = NULL,
  GroupVariables = NULL,
  FC_Periods = 1,
  NThreads = max(1, parallel::detectCores() - 2L),
  SaveDataPath = NULL,
  TimeUnit = NULL,
  TimeGroups = NULL,
  TargetTransformation = FALSE,
  Methods = c("Asinh", "Log", "LogPlus1", "Sqrt"),
  EncodingMethod = "target_encoding",
  AnomalyDetection = NULL,
  Lags = NULL,
  MA_Periods = NULL,
  SD_Periods = NULL,
  Skew_Periods = NULL,
  Kurt_Periods = NULL,
  Quantile_Periods = NULL,
  Quantiles_Selected = c("q5", "q95"),
  Difference = TRUE,
  FourierTerms = 0,
  CalendarVariables = NULL,
  HolidayVariable = NULL,
  HolidayLookback = NULL,
  HolidayLags = 1L,
  HolidayMovingAverages = 3L,
  TimeTrendVariable = FALSE,
  DataTruncate = FALSE,
  ZeroPadSeries = "maxmax",
  SplitRatios = c(0.95, 0.05),
  PartitionType = "random",
  Timer = TRUE,
  SaveModel = FALSE,
  ArgsList = NULL,
  DebugMode = FALSE,
  ModelID = "FC001",
  GridTune = FALSE,
  GridEvalMetric = "mae",
  ModelCount = 30L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 24L * 60L,
  Device_Type = "cpu",
  LossFunction = "regression",
  EvalMetric = "mae",
  Input_Model = NULL,
  Task = "train",
  Boosting = "gbdt",
  LinearTree = FALSE,
  Trees = 500L,
  ETA = 0.5,
  Num_Leaves = 31,
  Deterministic = TRUE,
  Force_Col_Wise = FALSE,
  Force_Row_Wise = FALSE,
  Max_Depth = 6,
  Min_Data_In_Leaf = 20,
  Min_Sum_Hessian_In_Leaf = 0.001,
  Bagging_Freq = 1,
  Bagging_Fraction = 0.7,
  Feature_Fraction = 1,
  Feature_Fraction_Bynode = 1,
  Lambda_L1 = 4,
  Lambda_L2 = 4,
  Extra_Trees = FALSE,
  Early_Stopping_Round = 10,
  First_Metric_Only = TRUE,
  Max_Delta_Step = 0,
  Linear_Lambda = 0,
  Min_Gain_To_Split = 0,
  Drop_Rate_Dart = 0.1,
  Max_Drop_Dart = 50,
  Skip_Drop_Dart = 0.5,
  Uniform_Drop_Dart = FALSE,
  Top_Rate_Goss = FALSE,
  Other_Rate_Goss = FALSE,
  Monotone_Constraints = NULL,
  Monotone_Constraints_method = "advanced",
  Monotone_Penalty = 0,
  Forcedsplits_Filename = NULL,
  Refit_Decay_Rate = 0.9,
  Path_Smooth = 0,
  Max_Bin = 255,
  Min_Data_In_Bin = 3,
  Data_Random_Seed = 1,
  Is_Enable_Sparse = TRUE,
  Enable_Bundle = TRUE,
  Use_Missing = TRUE,
  Zero_As_Missing = FALSE,
  Two_Round = FALSE,
  Convert_Model = NULL,
  Convert_Model_Language = "cpp",
  Boost_From_Average = TRUE,
  Alpha = 0.9,
  Fair_C = 1,
  Poisson_Max_Delta_Step = 0.7,
  Tweedie_Variance_Power = 1.5,
  Lambdarank_Truncation_Level = 30,
  Is_Provide_Training_Metric = TRUE,
  Eval_At = c(1, 2, 3, 4, 5),
  Num_Machines = 1,
  Gpu_Platform_Id = -1,
  Gpu_Device_Id = -1,
  Gpu_Use_Dp = TRUE,
  Num_Gpu = 1,
  TVT = NULL
)

Arguments

`data`	Supply your full series data set here
`XREGS`	Additional data to use for model development and forecasting. Data needs to be a complete series which means both the historical and forward looking values over the specified forecast window needs to be supplied.
`TimeWeights`	Supply a value that will be multiplied by he time trend value
`NonNegativePred`	TRUE or FALSE
`RoundPreds`	Rounding predictions to an integer value. TRUE or FALSE. Defaults to FALSE
`TrainOnFull`	Set to TRUE to train on full data
`TargetColumnName`	List the column name of your target variables column. E.g. 'Target'
`DateColumnName`	List the column name of your date column. E.g. 'DateTime'
`HierarchGroups`	= NULL Character vector or NULL with names of the columns that form the interaction hierarchy
`GroupVariables`	Defaults to NULL. Use NULL when you have a single series. Add in GroupVariables when you have a series for every level of a group or multiple groups.
`FC_Periods`	Set the number of periods you want to have forecasts for. E.g. 52 for weekly data to forecast a year ahead
`NThreads`	Set the maximum number of threads you'd like to dedicate to the model run. E.g. 8
`SaveDataPath`	Path to save modeling data
`TimeUnit`	List the time unit your data is aggregated by. E.g. '1min', '5min', '10min', '15min', '30min', 'hour', 'day', 'week', 'month', 'quarter', 'year'
`TimeGroups`	Select time aggregations for adding various time aggregated GDL features.
`TargetTransformation`	Run Rodeo::AutoTransformationCreate() to find best transformation for the target variable. Tests YeoJohnson, BoxCox, and Asigh (also Asin and Logit for proportion target variables).
`Methods`	Choose from 'YeoJohnson', 'BoxCox', 'Asinh', 'Log', 'LogPlus1', 'Sqrt', 'Asin', or 'Logit'. If more than one is selected, the one with the best normalization pearson statistic will be used. Identity is automatically selected and compared.
`EncodingMethod`	Choose from 'binary', 'm_estimator', 'credibility', 'woe', 'target_encoding', 'poly_encode', 'backward_difference', 'helmert'
`AnomalyDetection`	NULL for not using the service. Other, provide a list, e.g. AnomalyDetection = list('tstat_high' = 4, tstat_low = -4)
`Lags`	Select the periods for all lag variables you want to create. E.g. c(1:5,52) or list('day' = c(1:10), 'weeks' = c(1:4))
`MA_Periods`	Select the periods for all moving average variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))
`SD_Periods`	Select the periods for all moving standard deviation variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))
`Skew_Periods`	Select the periods for all moving skewness variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))
`Kurt_Periods`	Select the periods for all moving kurtosis variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))
`Quantile_Periods`	Select the periods for all moving quantiles variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))
`Quantiles_Selected`	Select from the following c('q5','q10','q15','q20','q25','q30','q35','q40','q45','q50','q55','q60','q65','q70','q75','q80','q85','q90','q95')
`Difference`	Set to TRUE to put the I in ARIMA
`FourierTerms`	Set to the max number of pairs
`CalendarVariables`	NULL, or select from 'second', 'minute', 'hour', 'wday', 'mday', 'yday', 'week', 'wom', 'isoweek', 'month', 'quarter', 'year'
`HolidayVariable`	NULL, or select from 'USPublicHolidays', 'EasterGroup', 'ChristmasGroup', 'OtherEcclesticalFeasts'
`HolidayLookback`	Number of days in range to compute number of holidays from a given date in the data. If NULL, the number of days are computed for you.
`HolidayLags`	Number of lags for the holiday counts
`HolidayMovingAverages`	Number of moving averages for holiday counts
`TimeTrendVariable`	Set to TRUE to have a time trend variable added to the model. Time trend is numeric variable indicating the numeric value of each record in the time series (by group). Time trend starts at 1 for the earliest point in time and increments by one for each success time point.
`DataTruncate`	Set to TRUE to remove records with missing values from the lags and moving average features created
`ZeroPadSeries`	NULL to do nothing. Otherwise, set to 'maxmax', 'minmax', 'maxmin', 'minmin'. See `TimeSeriesFill` for explanations of each type
`SplitRatios`	E.g c(0.7,0.2,0.1) for train, validation, and test sets
`PartitionType`	Select 'random' for random data partitioning 'time' for partitioning by time frames
`Timer`	Setting to TRUE prints out the forecast number while it is building
`SaveModel`	Logical. If TRUE, output ArgsList will have a named element 'Model' with the CatBoost model object
`ArgsList`	ArgsList is for scoring. Must contain named element 'Model' with a catboost model object
`DebugMode`	Setting to TRUE generates printout of all header code comments during run time of function
`ModelID`	Something to name your model if you want it saved
`GridTune`	Set to TRUE to run a grid tune
`GridEvalMetric`	This is the metric used to find the threshold 'poisson', 'mae', 'mape', 'mse', 'msle', 'kl', 'cs', 'r2'
`ModelCount`	Set the number of models to try in the grid tune
`MaxRunsWithoutNewWinner`	Number of consecutive runs without a new winner in order to terminate procedure
`MaxRunMinutes`	Default 24L*60L
`Device_Type`	= 'CPU'
`LossFunction`	= 'regression' (or 'mean_squared_error'), 'regression_l1' (or 'mean_absolute_error'), 'mae' (or 'mean_absolute_percentage_error'), 'huber', 'fair', 'poisson', 'quantile', 'gamma', 'tweedie'
`EvalMetric`	= 'mae'
`Input_Model`	= NULL
`Task`	= 'train'
`Boosting`	= 'gbdt'
`LinearTree`	= FALSE
`Trees`	= 1000
`ETA`	= 0.10
`Num_Leaves`	= 31
`Deterministic`	= TRUE # Learning Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters
`Force_Col_Wise`	= FALSE
`Force_Row_Wise`	= FALSE
`Max_Depth`	= 6
`Min_Data_In_Leaf`	= 20
`Min_Sum_Hessian_In_Leaf`	= 0.001
`Bagging_Freq`	= 1.0
`Bagging_Fraction`	= 1.0
`Feature_Fraction`	= 1.0
`Feature_Fraction_Bynode`	= 1.0
`Lambda_L1`	= 0.0
`Lambda_L2`	= 0.0
`Extra_Trees`	= FALSE
`Early_Stopping_Round`	= 10
`First_Metric_Only`	= TRUE
`Max_Delta_Step`	= 0.0
`Linear_Lambda`	= 0.0
`Min_Gain_To_Split`	= 0
`Drop_Rate_Dart`	= 0.10
`Max_Drop_Dart`	= 50
`Skip_Drop_Dart`	= 0.50
`Uniform_Drop_Dart`	= FALSE
`Top_Rate_Goss`	= FALSE
`Other_Rate_Goss`	= FALSE
`Monotone_Constraints`	= NULL
`Monotone_Constraints_method`	= 'advanced'
`Monotone_Penalty`	= 0.0
`Forcedsplits_Filename`	= NULL
`Refit_Decay_Rate`	= 0.90
`Path_Smooth`	= 0.0 # IO Dataset Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters
`Max_Bin`	= 255
`Min_Data_In_Bin`	= 3
`Data_Random_Seed`	= 1
`Is_Enable_Sparse`	= TRUE
`Enable_Bundle`	= TRUE
`Use_Missing`	= TRUE
`Zero_As_Missing`	= FALSE
`Two_Round`	= FALSE # Convert Parameters
`Convert_Model`	= NULL
`Convert_Model_Language`	= 'cpp' # Objective Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective-parameters
`Boost_From_Average`	= TRUE
`Alpha`	= 0.90
`Fair_C`	= 1.0
`Poisson_Max_Delta_Step`	= 0.70
`Tweedie_Variance_Power`	= 1.5
`Lambdarank_Truncation_Level`	= 30 # Metric Parameters (metric is in Core) # https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters
`Is_Provide_Training_Metric`	= TRUE,
`Eval_At`	= c(1,2,3,4,5) # Network Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#network-parameters
`Num_Machines`	= 1 # GPU Parameters
`Gpu_Platform_Id`	= -1
`Gpu_Device_Id`	= -1
`Gpu_Use_Dp`	= TRUE
`Num_Gpu`	= 1
`TVT`	Passthrough # ML Args begin
`TreeMethod`	Choose from 'hist', 'gpu_hist'
`#`	https://lightgbm.readthedocs.io/en/latest/Parameters.html#gpu-parameters

Value

See examples

Author(s)

Adrian Antico

Examples

## Not run: 

# Load data
data <- data.table::fread('https://www.dropbox.com/s/2str3ek4f4cheqi/walmart_train.csv?dl=1')

# Ensure series have no missing dates (also remove series with more than 25% missing values)
data <- AutoQuant::TimeSeriesFill(
  data,
  DateColumnName = 'Date',
  GroupVariables = c('Store','Dept'),
  TimeUnit = 'weeks',
  FillType = 'maxmax',
  MaxMissingPercent = 0.25,
  SimpleImpute = TRUE)

# Set negative numbers to 0
data <- data[, Weekly_Sales := data.table::fifelse(Weekly_Sales < 0, 0, Weekly_Sales)]

# Remove IsHoliday column
data[, IsHoliday := NULL]

# Create xregs (this is the include the categorical variables instead of utilizing only the interaction of them)
xregs <- data[, .SD, .SDcols = c('Date', 'Store', 'Dept')]

# Change data types
data[, ':=' (Store = as.character(Store), Dept = as.character(Dept))]
xregs[, ':=' (Store = as.character(Store), Dept = as.character(Dept))]

# Build forecast
Results <- AutoLightGBMCARMA(

  # Data Artifacts
  data = data,
  NonNegativePred = FALSE,
  RoundPreds = FALSE,
  TargetColumnName = 'Weekly_Sales',
  DateColumnName = 'Date',
  HierarchGroups = NULL,
  GroupVariables = c('Store','Dept'),
  TimeUnit = 'weeks',
  TimeGroups = c('weeks','months'),

  # Data Wrangling Features
  EncodingMethod = 'binary',
  ZeroPadSeries = NULL,
  DataTruncate = FALSE,
  SplitRatios = c(1 - 10 / 138, 10 / 138),
  PartitionType = 'timeseries',
  AnomalyDetection = NULL,

  # Productionize
  FC_Periods = 0,
  TrainOnFull = FALSE,
  NThreads = 8,
  Timer = TRUE,
  DebugMode = FALSE,
  SaveDataPath = NULL,
  SaveModel = FALSE,
  ArgsList = NULL,

  # Target Transformations
  TargetTransformation = TRUE,
  Methods = c('BoxCox', 'Asinh', 'Asin', 'Log',
              'LogPlus1', 'Sqrt', 'Logit','YeoJohnson'),
  Difference = FALSE,

  # Features
  Lags = list('weeks' = seq(1L, 10L, 1L),
              'months' = seq(1L, 5L, 1L)),
  MA_Periods = list('weeks' = seq(5L, 20L, 5L),
                    'months' = seq(2L, 10L, 2L)),
  SD_Periods = NULL,
  Skew_Periods = NULL,
  Kurt_Periods = NULL,
  Quantile_Periods = NULL,
  Quantiles_Selected = c('q5','q95'),
  XREGS = xregs,
  FourierTerms = 4,
  CalendarVariables = c('week', 'wom', 'month', 'quarter'),
  HolidayVariable = c('USPublicHolidays','EasterGroup',
    'ChristmasGroup','OtherEcclesticalFeasts'),
  HolidayLookback = NULL,
  HolidayLags = 1,
  HolidayMovingAverages = 1:2,
  TimeTrendVariable = TRUE,

  # ML eval args
  TreeMethod = 'hist',
  EvalMetric = 'RMSE',
  LossFunction = 'reg:squarederror',

  # Grid tuning args
  GridTune = FALSE,
  GridEvalMetric = 'mae',
  ModelCount = 30L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 24L*60L,

  # LightGBM Args
  Device_Type = TaskType,
  LossFunction = 'regression',
  EvalMetric = 'MAE',
  Input_Model = NULL,
  Task = 'train',
  Boosting = 'gbdt',
  LinearTree = FALSE,
  Trees = 1000,
  ETA = 0.10,
  Num_Leaves = 31,
  Deterministic = TRUE,

  # Learning Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters
  Force_Col_Wise = FALSE,
  Force_Row_Wise = FALSE,
  Max_Depth = 6,
  Min_Data_In_Leaf = 20,
  Min_Sum_Hessian_In_Leaf = 0.001,
  Bagging_Freq = 1.0,
  Bagging_Fraction = 1.0,
  Feature_Fraction = 1.0,
  Feature_Fraction_Bynode = 1.0,
  Lambda_L1 = 0.0,
  Lambda_L2 = 0.0,
  Extra_Trees = FALSE,
  Early_Stopping_Round = 10,
  First_Metric_Only = TRUE,
  Max_Delta_Step = 0.0,
  Linear_Lambda = 0.0,
  Min_Gain_To_Split = 0,
  Drop_Rate_Dart = 0.10,
  Max_Drop_Dart = 50,
  Skip_Drop_Dart = 0.50,
  Uniform_Drop_Dart = FALSE,
  Top_Rate_Goss = FALSE,
  Other_Rate_Goss = FALSE,
  Monotone_Constraints = NULL,
  Monotone_Constraints_Method = 'advanced',
  Monotone_Penalty = 0.0,
  Forcedsplits_Filename = NULL, # use for AutoStack option; .json file
  Refit_Decay_Rate = 0.90,
  Path_Smooth = 0.0,

  # IO Dataset Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters
  Max_Bin = 255,
  Min_Data_In_Bin = 3,
  Data_Random_Seed = 1,
  Is_Enable_Sparse = TRUE,
  Enable_Bundle = TRUE,
  Use_Missing = TRUE,
  Zero_As_Missing = FALSE,
  Two_Round = FALSE,

  # Convert Parameters
  Convert_Model = NULL,
  Convert_Model_Language = 'cpp',

  # Objective Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective-parameters
  Boost_From_Average = TRUE,
  Alpha = 0.90,
  Fair_C = 1.0,
  Poisson_Max_Delta_Step = 0.70,
  Tweedie_Variance_Power = 1.5,
  Lambdarank_Truncation_Level = 30,

  # Metric Parameters (metric is in Core)
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters
  Is_Provide_Training_Metric = TRUE,
  Eval_At = c(1,2,3,4,5),

  # Network Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#network-parameters
  Num_Machines = 1,

  # GPU Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#gpu-parameters
  Gpu_Platform_Id = -1,
  Gpu_Device_Id = -1,
  Gpu_Use_Dp = TRUE,
  Num_Gpu = 1)

UpdateMetrics <- print(
  Results$ModelInformation$EvaluationMetrics[
    Metric == 'MSE', MetricValue := sqrt(MetricValue)])
print(UpdateMetrics)
Results$ModelInformation$EvaluationMetricsByGroup[order(-R2_Metric)]
Results$ModelInformation$EvaluationMetricsByGroup[order(MAE_Metric)]
Results$ModelInformation$EvaluationMetricsByGroup[order(MSE_Metric)]
Results$ModelInformation$EvaluationMetricsByGroup[order(MAPE_Metric)]

## End(Not run)

AdrianAntico/ModelingTools documentation built on June 10, 2025, 1:17 a.m.