match_forecast_model: match_forecast_model
In fortunar/input_uncertainty_model: Match Forecast

Description Usage Arguments Details Value Examples

Builds match_forecast_model to forecast match outcomes by performing bagging using two level modeling. The approach takes into account the uncertainty in the inputs to avoid overconfidence of ML models. Since the inputs to a match outcome prediction problem are often uncertain the approach can enhance the predictive performance of ML models.

1
2
3

match_forecast_model(data, input_model_specification, num_models,
  transformation = NULL, weighting = NULL, get_model, priors = NULL,
  report_time = F)

data

data.frame in match-objects format required by this package. Columns have to include object ID's in the format: ID_<number> and attributes in the format <attr_name>_<number>. <number> suffix connects objects to attributes. The suffixed number of corresponding object and attributes should be the same. Column for the outcome (dependant variable) of the match should be denoted with y. It can also include an attribute TIME.

An example of the columns for basketball data:

TIME	y	ID_1	ID_2	P2M_1	P3M_1	P2M_2	P3M_2	..

In the basketball case ID_1 refers to ID of the home team and P2M_1, P3M_1 to the attributes of the home team. ID_2 refers to the ID of away team and P2M_2, P3M_2 to its attributes. y is the outcome of the match.

input_model_specification

Specifies the parametric assumptions of object attributes. We can model attributes of objects as independent with uivariate models or as dependent with multivariate models.

Four univariate Bayesian models are supported out of the box:

Poisson (Poisson-Gamma model), appropriate for count data, see input_model_poisson
Bernoulli (Bernoulli-Beta model), appropriate for binary data, see input_model_bernoulli
Normal (Normal model with sample variance), appropriate for numeric data when modeling only mean, see input_model_normal
normal (Normal-Inverse Gamma model), appropriate for jointly modeling mean and variance, see input_model_normal_ig

One multivariate Bayesian model is supported: Multivariate normal model with inverse Wishart prior for covariance matrix (see input_model_mvnormal_iw). It is appropriate for jointly modeling numeric attributes of objects. A custom model can also be used if a function is passed as input_model_specification parameter. See the source code of other input models for an example of how to write a custom model.

How to invoke the models described above? For independent Bayesian models a string or a list should be passed in as input_model_specification parameter. If it is a string, all attributes have the same parametric assumption. If it is a list, the keys should correspond to columns (attributes) and values to parametric assumptions. If a list of name-value pairs is used, different attributes can have different parametric assumptions. The strings corresponding to supported parametric assumptions described above are 'poisson', 'bernoulli', 'normal', 'normal_ig'. Multivariate normal model with inverse Wishart prior is invoked by passing in input_model_specification = list(dependent = T, type = "mvnormal_iw"). See examples below and input models linked above for more details.

num_models

Number of models (distributions) to obtain per attribute. Also matches the number of resulting bagged ML models, since each bagged model is obtained on one set of distributions.

transformation

Specifies how attribute distributions are transformed into actual features being fed into ML algorithm. For each of the parametric assumptions mentioned above mean transformation is supported out of the box. It is invoked by passing transformation = "means" as parameter. A custom function can also be passed in. In this case it gets called for every object with a list of distributions that were fitted to object's attributes by the package. See the source code of object_model and transformation_means for more information.

weighting

Optional parameter. A function that uses the TIME attribute if present to weight the prior matches by importance when obtaining attribute distributions.

get_model

Function that should build a ML model on one of num_models datasets generated by the package. It receives match data with features of objects that are the output of transformation function. The ML model returned needs to support the standard predict() notation.

priors

Optional parameter. Specifies conjugate priors for supported Bayesian models. List of lists, one for each object. Names in the outer list correspond to object ID's. The values in the outer lists are lists with names corresponding to attributes and values to specifications of parameters of attribute prior distributions. The values in the inner lists depend on the parametric assumption used. See the documentation of supported parametric assumptions (e.g. input_model_poisson) for details on how to specify priors and examples below on how to use them.

report_time

Boolean denoting whether to report the execution time. Default is FALSE.

For detailed explanation see the introduction vignette by running: vignette("introduction", package = "matchForecast")

Structure of class match_forecast_model containing properties:

first_level_model: list with keys being instance ID's and values corresponding first level object models
second_level_model: list of length num_models with each entry being one of the bagged predictive models
some other implicit parameters (inspect the object for more info)

# Builds a logistic regression model on a single sampled train data set obtained
# from object models by applying a transformation function (e.g. means)
get_model <- function(data) {
  return(glm(y ~ ., family = binomial(link = "logit"), data = data))
}

# Example of a prior specification (basketball example), for an attribute P3A,
# which is a Poisson variable and measures 3-point attempts. Conjugate prior
# for Poisson distribution is gamma distribution with parameters a - scale and
# b - rate
priors_example <- list(
  "San Antonio Spurs" =
    list(
      "P3A" = list(a = 241.2, b = 12.4),
      ...
    ),
  "Golden State Warriors" =
    list(
      "P3A" = list(a = 280.9, b = 13.2),
      ...
    ),
  ...
)

# Builds the Match Forecast Model
mf_model <- match_forecast_model(
  # Data frame in match-object format
  data = data_train,
  # Parametric assumption about attributes
  input_model_specification = "poisson",
  # How many distributions to fit to each attribute
  num_models = 100,
  # How to build feature vectors from distributions
  transformation = "means",
  get_model = get_model,
  priors = priors_example
)