AutoLightGBMCARMA: AutoLightGBMCARMA

View source: R/AutoLightGBMCARMA.R

AutoLightGBMCARMAR Documentation

AutoLightGBMCARMA

Description

AutoLightGBMCARMA Mutlivariate Forecasting with calendar variables, Holiday counts, holiday lags, holiday moving averages, differencing, transformations, interaction-based categorical encoding using target variable and features to generate various time-based aggregated lags, moving averages, moving standard deviations, moving skewness, moving kurtosis, moving quantiles, parallelized interaction-based fourier pairs by grouping variables, and Trend Variables.

Usage

AutoLightGBMCARMA(
  data = NULL,
  XREGS = NULL,
  TimeWeights = NULL,
  NonNegativePred = FALSE,
  RoundPreds = FALSE,
  TrainOnFull = FALSE,
  TargetColumnName = NULL,
  DateColumnName = NULL,
  HierarchGroups = NULL,
  GroupVariables = NULL,
  FC_Periods = 1,
  NThreads = max(1, parallel::detectCores() - 2L),
  SaveDataPath = NULL,
  TimeUnit = NULL,
  TimeGroups = NULL,
  TargetTransformation = FALSE,
  Methods = c("Asinh", "Log", "LogPlus1", "Sqrt"),
  EncodingMethod = "target_encoding",
  AnomalyDetection = NULL,
  Lags = NULL,
  MA_Periods = NULL,
  SD_Periods = NULL,
  Skew_Periods = NULL,
  Kurt_Periods = NULL,
  Quantile_Periods = NULL,
  Quantiles_Selected = c("q5", "q95"),
  Difference = TRUE,
  FourierTerms = 0,
  CalendarVariables = NULL,
  HolidayVariable = NULL,
  HolidayLookback = NULL,
  HolidayLags = 1L,
  HolidayMovingAverages = 3L,
  TimeTrendVariable = FALSE,
  DataTruncate = FALSE,
  ZeroPadSeries = "maxmax",
  SplitRatios = c(0.95, 0.05),
  PartitionType = "random",
  Timer = TRUE,
  SaveModel = FALSE,
  ArgsList = NULL,
  DebugMode = FALSE,
  ModelID = "FC001",
  GridTune = FALSE,
  GridEvalMetric = "mae",
  ModelCount = 30L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 24L * 60L,
  Device_Type = "cpu",
  LossFunction = "regression",
  EvalMetric = "mae",
  Input_Model = NULL,
  Task = "train",
  Boosting = "gbdt",
  LinearTree = FALSE,
  Trees = 500L,
  ETA = 0.5,
  Num_Leaves = 31,
  Deterministic = TRUE,
  Force_Col_Wise = FALSE,
  Force_Row_Wise = FALSE,
  Max_Depth = 6,
  Min_Data_In_Leaf = 20,
  Min_Sum_Hessian_In_Leaf = 0.001,
  Bagging_Freq = 1,
  Bagging_Fraction = 0.7,
  Feature_Fraction = 1,
  Feature_Fraction_Bynode = 1,
  Lambda_L1 = 4,
  Lambda_L2 = 4,
  Extra_Trees = FALSE,
  Early_Stopping_Round = 10,
  First_Metric_Only = TRUE,
  Max_Delta_Step = 0,
  Linear_Lambda = 0,
  Min_Gain_To_Split = 0,
  Drop_Rate_Dart = 0.1,
  Max_Drop_Dart = 50,
  Skip_Drop_Dart = 0.5,
  Uniform_Drop_Dart = FALSE,
  Top_Rate_Goss = FALSE,
  Other_Rate_Goss = FALSE,
  Monotone_Constraints = NULL,
  Monotone_Constraints_method = "advanced",
  Monotone_Penalty = 0,
  Forcedsplits_Filename = NULL,
  Refit_Decay_Rate = 0.9,
  Path_Smooth = 0,
  Max_Bin = 255,
  Min_Data_In_Bin = 3,
  Data_Random_Seed = 1,
  Is_Enable_Sparse = TRUE,
  Enable_Bundle = TRUE,
  Use_Missing = TRUE,
  Zero_As_Missing = FALSE,
  Two_Round = FALSE,
  Convert_Model = NULL,
  Convert_Model_Language = "cpp",
  Boost_From_Average = TRUE,
  Alpha = 0.9,
  Fair_C = 1,
  Poisson_Max_Delta_Step = 0.7,
  Tweedie_Variance_Power = 1.5,
  Lambdarank_Truncation_Level = 30,
  Is_Provide_Training_Metric = TRUE,
  Eval_At = c(1, 2, 3, 4, 5),
  Num_Machines = 1,
  Gpu_Platform_Id = -1,
  Gpu_Device_Id = -1,
  Gpu_Use_Dp = TRUE,
  Num_Gpu = 1,
  TVT = NULL
)

Arguments

data

Supply your full series data set here

XREGS

Additional data to use for model development and forecasting. Data needs to be a complete series which means both the historical and forward looking values over the specified forecast window needs to be supplied.

TimeWeights

Supply a value that will be multiplied by he time trend value

NonNegativePred

TRUE or FALSE

RoundPreds

Rounding predictions to an integer value. TRUE or FALSE. Defaults to FALSE

TrainOnFull

Set to TRUE to train on full data

TargetColumnName

List the column name of your target variables column. E.g. 'Target'

DateColumnName

List the column name of your date column. E.g. 'DateTime'

HierarchGroups

= NULL Character vector or NULL with names of the columns that form the interaction hierarchy

GroupVariables

Defaults to NULL. Use NULL when you have a single series. Add in GroupVariables when you have a series for every level of a group or multiple groups.

FC_Periods

Set the number of periods you want to have forecasts for. E.g. 52 for weekly data to forecast a year ahead

NThreads

Set the maximum number of threads you'd like to dedicate to the model run. E.g. 8

SaveDataPath

Path to save modeling data

TimeUnit

List the time unit your data is aggregated by. E.g. '1min', '5min', '10min', '15min', '30min', 'hour', 'day', 'week', 'month', 'quarter', 'year'

TimeGroups

Select time aggregations for adding various time aggregated GDL features.

TargetTransformation

Run Rodeo::AutoTransformationCreate() to find best transformation for the target variable. Tests YeoJohnson, BoxCox, and Asigh (also Asin and Logit for proportion target variables).

Methods

Choose from 'YeoJohnson', 'BoxCox', 'Asinh', 'Log', 'LogPlus1', 'Sqrt', 'Asin', or 'Logit'. If more than one is selected, the one with the best normalization pearson statistic will be used. Identity is automatically selected and compared.

EncodingMethod

Choose from 'binary', 'm_estimator', 'credibility', 'woe', 'target_encoding', 'poly_encode', 'backward_difference', 'helmert'

AnomalyDetection

NULL for not using the service. Other, provide a list, e.g. AnomalyDetection = list('tstat_high' = 4, tstat_low = -4)

Lags

Select the periods for all lag variables you want to create. E.g. c(1:5,52) or list('day' = c(1:10), 'weeks' = c(1:4))

MA_Periods

Select the periods for all moving average variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))

SD_Periods

Select the periods for all moving standard deviation variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))

Skew_Periods

Select the periods for all moving skewness variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))

Kurt_Periods

Select the periods for all moving kurtosis variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))

Quantile_Periods

Select the periods for all moving quantiles variables you want to create. E.g. c(1:5,52) or list('day' = c(2:10), 'weeks' = c(2:4))

Quantiles_Selected

Select from the following c('q5','q10','q15','q20','q25','q30','q35','q40','q45','q50','q55','q60','q65','q70','q75','q80','q85','q90','q95')

Difference

Set to TRUE to put the I in ARIMA

FourierTerms

Set to the max number of pairs

CalendarVariables

NULL, or select from 'second', 'minute', 'hour', 'wday', 'mday', 'yday', 'week', 'wom', 'isoweek', 'month', 'quarter', 'year'

HolidayVariable

NULL, or select from 'USPublicHolidays', 'EasterGroup', 'ChristmasGroup', 'OtherEcclesticalFeasts'

HolidayLookback

Number of days in range to compute number of holidays from a given date in the data. If NULL, the number of days are computed for you.

HolidayLags

Number of lags for the holiday counts

HolidayMovingAverages

Number of moving averages for holiday counts

TimeTrendVariable

Set to TRUE to have a time trend variable added to the model. Time trend is numeric variable indicating the numeric value of each record in the time series (by group). Time trend starts at 1 for the earliest point in time and increments by one for each success time point.

DataTruncate

Set to TRUE to remove records with missing values from the lags and moving average features created

ZeroPadSeries

NULL to do nothing. Otherwise, set to 'maxmax', 'minmax', 'maxmin', 'minmin'. See TimeSeriesFill for explanations of each type

SplitRatios

E.g c(0.7,0.2,0.1) for train, validation, and test sets

PartitionType

Select 'random' for random data partitioning 'time' for partitioning by time frames

Timer

Setting to TRUE prints out the forecast number while it is building

SaveModel

Logical. If TRUE, output ArgsList will have a named element 'Model' with the CatBoost model object

ArgsList

ArgsList is for scoring. Must contain named element 'Model' with a catboost model object

DebugMode

Setting to TRUE generates printout of all header code comments during run time of function

ModelID

Something to name your model if you want it saved

GridTune

Set to TRUE to run a grid tune

GridEvalMetric

This is the metric used to find the threshold 'poisson', 'mae', 'mape', 'mse', 'msle', 'kl', 'cs', 'r2'

ModelCount

Set the number of models to try in the grid tune

MaxRunsWithoutNewWinner

Number of consecutive runs without a new winner in order to terminate procedure

MaxRunMinutes

Default 24L*60L

Device_Type

= 'CPU'

LossFunction

= 'regression' (or 'mean_squared_error'), 'regression_l1' (or 'mean_absolute_error'), 'mae' (or 'mean_absolute_percentage_error'), 'huber', 'fair', 'poisson', 'quantile', 'gamma', 'tweedie'

EvalMetric

= 'mae'

Input_Model

= NULL

Task

= 'train'

Boosting

= 'gbdt'

LinearTree

= FALSE

Trees

= 1000

ETA

= 0.10

Num_Leaves

= 31

Deterministic

= TRUE

# Learning Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters

Force_Col_Wise

= FALSE

Force_Row_Wise

= FALSE

Max_Depth

= 6

Min_Data_In_Leaf

= 20

Min_Sum_Hessian_In_Leaf

= 0.001

Bagging_Freq

= 1.0

Bagging_Fraction

= 1.0

Feature_Fraction

= 1.0

Feature_Fraction_Bynode

= 1.0

Lambda_L1

= 0.0

Lambda_L2

= 0.0

Extra_Trees

= FALSE

Early_Stopping_Round

= 10

First_Metric_Only

= TRUE

Max_Delta_Step

= 0.0

Linear_Lambda

= 0.0

Min_Gain_To_Split

= 0

Drop_Rate_Dart

= 0.10

Max_Drop_Dart

= 50

Skip_Drop_Dart

= 0.50

Uniform_Drop_Dart

= FALSE

Top_Rate_Goss

= FALSE

Other_Rate_Goss

= FALSE

Monotone_Constraints

= NULL

Monotone_Constraints_method

= 'advanced'

Monotone_Penalty

= 0.0

Forcedsplits_Filename

= NULL

Refit_Decay_Rate

= 0.90

Path_Smooth

= 0.0

# IO Dataset Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters

Max_Bin

= 255

Min_Data_In_Bin

= 3

Data_Random_Seed

= 1

Is_Enable_Sparse

= TRUE

Enable_Bundle

= TRUE

Use_Missing

= TRUE

Zero_As_Missing

= FALSE

Two_Round

= FALSE

# Convert Parameters

Convert_Model

= NULL

Convert_Model_Language

= 'cpp'

# Objective Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective-parameters

Boost_From_Average

= TRUE

Alpha

= 0.90

Fair_C

= 1.0

Poisson_Max_Delta_Step

= 0.70

Tweedie_Variance_Power

= 1.5

Lambdarank_Truncation_Level

= 30

# Metric Parameters (metric is in Core) # https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters

Is_Provide_Training_Metric

= TRUE,

Eval_At

= c(1,2,3,4,5)

# Network Parameters # https://lightgbm.readthedocs.io/en/latest/Parameters.html#network-parameters

Num_Machines

= 1

# GPU Parameters

Gpu_Platform_Id

= -1

Gpu_Device_Id

= -1

Gpu_Use_Dp

= TRUE

Num_Gpu

= 1

TVT

Passthrough

# ML Args begin

TreeMethod

Choose from 'hist', 'gpu_hist'

#

https://lightgbm.readthedocs.io/en/latest/Parameters.html#gpu-parameters

Value

See examples

Author(s)

Adrian Antico

See Also

Other Automated Panel Data Forecasting: AutoCatBoostCARMA(), AutoH2OCARMA(), AutoXGBoostCARMA()

Examples

## Not run: 

# Load data
data <- data.table::fread('https://www.dropbox.com/s/2str3ek4f4cheqi/walmart_train.csv?dl=1')

# Ensure series have no missing dates (also remove series with more than 25% missing values)
data <- AutoQuant::TimeSeriesFill(
  data,
  DateColumnName = 'Date',
  GroupVariables = c('Store','Dept'),
  TimeUnit = 'weeks',
  FillType = 'maxmax',
  MaxMissingPercent = 0.25,
  SimpleImpute = TRUE)

# Set negative numbers to 0
data <- data[, Weekly_Sales := data.table::fifelse(Weekly_Sales < 0, 0, Weekly_Sales)]

# Remove IsHoliday column
data[, IsHoliday := NULL]

# Create xregs (this is the include the categorical variables instead of utilizing only the interaction of them)
xregs <- data[, .SD, .SDcols = c('Date', 'Store', 'Dept')]

# Change data types
data[, ':=' (Store = as.character(Store), Dept = as.character(Dept))]
xregs[, ':=' (Store = as.character(Store), Dept = as.character(Dept))]

# Build forecast
Results <- AutoLightGBMCARMA(

  # Data Artifacts
  data = data,
  NonNegativePred = FALSE,
  RoundPreds = FALSE,
  TargetColumnName = 'Weekly_Sales',
  DateColumnName = 'Date',
  HierarchGroups = NULL,
  GroupVariables = c('Store','Dept'),
  TimeUnit = 'weeks',
  TimeGroups = c('weeks','months'),

  # Data Wrangling Features
  EncodingMethod = 'binary',
  ZeroPadSeries = NULL,
  DataTruncate = FALSE,
  SplitRatios = c(1 - 10 / 138, 10 / 138),
  PartitionType = 'timeseries',
  AnomalyDetection = NULL,

  # Productionize
  FC_Periods = 0,
  TrainOnFull = FALSE,
  NThreads = 8,
  Timer = TRUE,
  DebugMode = FALSE,
  SaveDataPath = NULL,
  SaveModel = FALSE,
  ArgsList = NULL,

  # Target Transformations
  TargetTransformation = TRUE,
  Methods = c('BoxCox', 'Asinh', 'Asin', 'Log',
              'LogPlus1', 'Sqrt', 'Logit','YeoJohnson'),
  Difference = FALSE,

  # Features
  Lags = list('weeks' = seq(1L, 10L, 1L),
              'months' = seq(1L, 5L, 1L)),
  MA_Periods = list('weeks' = seq(5L, 20L, 5L),
                    'months' = seq(2L, 10L, 2L)),
  SD_Periods = NULL,
  Skew_Periods = NULL,
  Kurt_Periods = NULL,
  Quantile_Periods = NULL,
  Quantiles_Selected = c('q5','q95'),
  XREGS = xregs,
  FourierTerms = 4,
  CalendarVariables = c('week', 'wom', 'month', 'quarter'),
  HolidayVariable = c('USPublicHolidays','EasterGroup',
    'ChristmasGroup','OtherEcclesticalFeasts'),
  HolidayLookback = NULL,
  HolidayLags = 1,
  HolidayMovingAverages = 1:2,
  TimeTrendVariable = TRUE,

  # ML eval args
  TreeMethod = 'hist',
  EvalMetric = 'RMSE',
  LossFunction = 'reg:squarederror',

  # Grid tuning args
  GridTune = FALSE,
  GridEvalMetric = 'mae',
  ModelCount = 30L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 24L*60L,

  # LightGBM Args
  Device_Type = TaskType,
  LossFunction = 'regression',
  EvalMetric = 'MAE',
  Input_Model = NULL,
  Task = 'train',
  Boosting = 'gbdt',
  LinearTree = FALSE,
  Trees = 1000,
  ETA = 0.10,
  Num_Leaves = 31,
  Deterministic = TRUE,

  # Learning Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#learning-control-parameters
  Force_Col_Wise = FALSE,
  Force_Row_Wise = FALSE,
  Max_Depth = 6,
  Min_Data_In_Leaf = 20,
  Min_Sum_Hessian_In_Leaf = 0.001,
  Bagging_Freq = 1.0,
  Bagging_Fraction = 1.0,
  Feature_Fraction = 1.0,
  Feature_Fraction_Bynode = 1.0,
  Lambda_L1 = 0.0,
  Lambda_L2 = 0.0,
  Extra_Trees = FALSE,
  Early_Stopping_Round = 10,
  First_Metric_Only = TRUE,
  Max_Delta_Step = 0.0,
  Linear_Lambda = 0.0,
  Min_Gain_To_Split = 0,
  Drop_Rate_Dart = 0.10,
  Max_Drop_Dart = 50,
  Skip_Drop_Dart = 0.50,
  Uniform_Drop_Dart = FALSE,
  Top_Rate_Goss = FALSE,
  Other_Rate_Goss = FALSE,
  Monotone_Constraints = NULL,
  Monotone_Constraints_Method = 'advanced',
  Monotone_Penalty = 0.0,
  Forcedsplits_Filename = NULL, # use for AutoStack option; .json file
  Refit_Decay_Rate = 0.90,
  Path_Smooth = 0.0,

  # IO Dataset Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#io-parameters
  Max_Bin = 255,
  Min_Data_In_Bin = 3,
  Data_Random_Seed = 1,
  Is_Enable_Sparse = TRUE,
  Enable_Bundle = TRUE,
  Use_Missing = TRUE,
  Zero_As_Missing = FALSE,
  Two_Round = FALSE,

  # Convert Parameters
  Convert_Model = NULL,
  Convert_Model_Language = 'cpp',

  # Objective Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#objective-parameters
  Boost_From_Average = TRUE,
  Alpha = 0.90,
  Fair_C = 1.0,
  Poisson_Max_Delta_Step = 0.70,
  Tweedie_Variance_Power = 1.5,
  Lambdarank_Truncation_Level = 30,

  # Metric Parameters (metric is in Core)
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters
  Is_Provide_Training_Metric = TRUE,
  Eval_At = c(1,2,3,4,5),

  # Network Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#network-parameters
  Num_Machines = 1,

  # GPU Parameters
  # https://lightgbm.readthedocs.io/en/latest/Parameters.html#gpu-parameters
  Gpu_Platform_Id = -1,
  Gpu_Device_Id = -1,
  Gpu_Use_Dp = TRUE,
  Num_Gpu = 1)

UpdateMetrics <- print(
  Results$ModelInformation$EvaluationMetrics[
    Metric == 'MSE', MetricValue := sqrt(MetricValue)])
print(UpdateMetrics)
Results$ModelInformation$EvaluationMetricsByGroup[order(-R2_Metric)]
Results$ModelInformation$EvaluationMetricsByGroup[order(MAE_Metric)]
Results$ModelInformation$EvaluationMetricsByGroup[order(MSE_Metric)]
Results$ModelInformation$EvaluationMetricsByGroup[order(MAPE_Metric)]

## End(Not run)

AdrianAntico/ModelingTools documentation built on Feb. 1, 2024, 7:33 a.m.