ml_gridwfo: Walk-Forward Optimization using ML grid search at each WFO...

Description Usage Arguments Details Value

Description

Performs a grid search of machine learning models at each walk-forward optimization date.

Usage

1
2
3
4
ml_gridwfo(df, ycol = 1, featurelist, datecol = NA, IDcol = NULL,
  searchmethod = "list", wfodates = "months", wfo_offset = 0,
  mktseries = xts_gspc, trainwin = 8, meritFUN = "RMSE",
  meritFUNpar = NULL, tradePAR = NA, mlalgo = "xgboost", mlpar = NA)

Arguments

df

The dataframe containing the target variable (y) and the features.

ycol

The name of the column in dfcontaining the target variable.

featurelist

A list containing vectors of feature names. Each vector may be a single feature name or a set of feature names. The grid search is performed using the method specified by parameter searchMethod (see below).

datecol

The name of the column in df containing the date at which a prediction is made.

IDcol

A vector of column names that identify a given trade. At a minimum, it should contain the ticker symbol column name in df to identify which stock is being traded.

searchmethod

The method used to search through the featurelist. Two methods are supported: "list" and "forwardsearch". The list method simply iterates through the list one set at a time and picks the best performing set as evaluated by meritFUN. The forwardsearch method on the other hand performs a forward search by combining feature sets one at a time using the forward search algorithm. This is slower but can find combinations of features that may be more predictive.

wfodates

This is either a vector of database release dates, or it is calculated at the periodic endpoints of the market open dates. To calculate at periodic endpoints, wfodates must contain one of "weeks", months", "quarters" or "years", which specify the period. The wfodates are then determined as the inner merge between the datecol in df, and the periodic index endpoints extracted from mktseries. Function endpoints is used to calculate such periodic index endpoints.

wfo_offset

The offset in market days at which the training period ends compared to the wfodate. For example, if wfo_offset = 5, then the training period ends 5 market days before wfodates.

mktseries

An xts times series of an index or security that includes all dates in the dataframe df. It is used to calculate the WFO dates using the endpoints function.

trainwin

The size of the training period (a positive integer) used for training the final model. This number corresponds to the number of WFO periods used for training. For example, if wfodates = "months", and trainwin = 8, then the training period is 8 months ending at wfodate - wfo_offset.

meritFUN

The name of the scoring function used to evaluate the performance of each grid ML model. This is performed on the validation set, which consists of the period between the last two WFO dates in the training window. Supported scoring functions include the following:

  • overnight_returns: This function uses the trade_overnight function to build an equity curve based on trading parameters passed to trade_overnight via meritFUNpar. The raw return is used as the score, so the model with the highest return is selected.

  • overnight_MDD: This function compute at the worst drawdown during the validation period, from an equity curve generated by trade_overnight. The lowest MDD model is selected.

  • RMSE: This function computes the root mean square error between y and yhat over the validation period. The model with the lowest RMSE score is selected.

  • overnight_MAR: This function uses the trade_overnight function to build an equity curve, then the MDD and total return over the validation period is calculated in order to obtain the MAR ratio. The model with the highest MAR is selected.

  • overnight_F1score: not yet implemented

    overnight_Sharpe: not yet implemented

meritFUNpar

A list of parameters to be passed on to meritFUN. If meritFUN has not parameters, then this can be set to NULL.

tradePAR

A dataframe of parameters used to convert yhat into long trades and short trades. The dataframe must have FOUR columns: "long_thresh", "short_thresh", "max_posn" and "max_weight". Each row corresponds to one evaluation of yhat tested against the long and the short thresholds. Long trades result when yhat > long_thresh and short trades result when yhat < short_thresh. Max_posn is the maximum number of simultaneous positions held, and max_weight is the maximum weight any position may have in the portfolio. On days when more trades are available than max_posn, then only the best trades are executed. In this context, best trades means those with the highest yhat (long) or lowest yhat (for short positions), as measured by the absolute value of yhat i.e. take the highest absolute yhats up to max_posn, then apply the correct sign to go long or short. Default NA means long_thresh = 0, short_thresh = -1000, max_posn = 10, max_weight = 0.25.

mlalgo

The name of the machine learning algorithm used. Currently, "xgboost" and "h2o.rf" are supported.

mlpar

A list containing the machine learning algorithm parameters. If empty or incomplete, then it is padded using function pad_mlpar().#'

Details

This approach differs from normal WFO as follows. At each WFO date, a number of ML models are trained, each using a different set of features. A subset of the training window, called pretrain, is used to build each model. This pretrain set starts at the beginning of the training window but ends one WFO period early. The last WFO period in the training window is used for model validation (the validation period). In other words, the validation period consists of the period between the last two consecutive WFO periods in the training window.

Each of these models are then validated at each thresholds specified by the tradePAR dataframe. A performance score is given to each model based on the chosen scoring function, meritFUN. The model with the highest score wins, and its feature set is then used to build the final model. The final model uses the entire training window and is used to predict the next WFO period, with threshold and maximum positions as specified by the tradePAR row performing best.

Value

Returns a list of two elements. The first element is an xts matrix with indices made from the inner merge of mktseries merged with the datecol dates in df. The details for this matrix are below. The second element is a dataframe of all trades details executed during the WFO run.

The xts matrix includes the following 3 columns:


jeanmarcgp/mlStocks documentation built on May 19, 2019, 12:38 a.m.