R/implied_probabilities.R

Defines functions jsd_solvefor jsd_func binom_jsd kld pwr_solvefor pwr_func or_solvefor shin_solvefor or_func shin_func uniroot2 default_uniroot_opts implied_probabilities

Documented in implied_probabilities

#' Implied probabilities from bookmaker odds.
#'
#' This function calculate the implied probabilities from bookmaker odds in decimal format, while
#' accounting for over-round in the odds.
#'
#' The method 'basic' is the simplest method, and computes the implied probabilities by
#' dividing the inverted odds by the sum of the inverted odds.
#'
#' The methods 'wpo' (Weights Proportional to the Odds), 'or' (Odds Ratio) and 'power' are form the Wisdom of the Crowds document (the updated version) by
#' Joseph Buchdahl. The method 'or' is originally by Cheung (2015), and the method 'power' is there referred
#' to as the logarithmic method.
#'
#' The method 'shin' uses the method by Shin (1992, 1993). This model assumes that there is a fraction of
#' insider trading, and that the bookmakers tries to maximize their profits. In addition to providing
#' implied probabilities, the method also gives an estimate of the proportion if inside trade, denoted z. Two algorithms
#' are implemented for finding the probabilities and z. Which algorithm to use is chosen via the shin_method argument.
#' The default method (shin_method = 'js') is based on the algorithm in Jullien & Salanié (1994). The 'uniroot'
#' method uses R's built in equation solver to find the probabilities. The uniroot approach is also used for the
#' 'pwr' and 'or' methods. The two methods might give slightly different answers, especially when the bookmaker margin
#' (and z) is small.
#'
#' The 'bb' (short for "balanced books") method is from Fingleton & Waldron (1999), and is a variant of Shin's method. It too assume
#' a fraction of insiders, but instead of assuming that the bookmakers maximize their profits, they
#' minimize their risk.
#'
#' Both the 'shin' and 'bb' methods can be used together with the 'grossmargin' argument. This is also
#' from the Fingleton & Waldron (1999) paper, and adds some further assumption to the calculations,
#' related to operating costs. grossmargin should be 0 (default) or greater, typical range is 0 to 0.05.
#' For values other than 0, this might sometimes cause some probabilities to not be identifiable. A warning
#' will be given if this happens.
#'
#' The method 'jsd' was developed by Christopher D. Long, and described in a series of Twitter postings
#' and a python implementation posted on GitHub.
#'
#' Methods 'shin', 'or', 'power', and 'jsd' use the uniroot solver to find the correct probabilities. Sometimes it will fail
#' to find a solution, but it can be made to work by tuning some setting. The uniroot_options argument accepts a list with
#' options that are passed on to the uniroot function. Currently the interval, maxit, tol and extendInt argument of
#' uniroot can be changed. See the Troubleshooting vignette for more details.
#'
#'
#' @param odds A matrix or numeric of bookmaker odds. The odds must be in the decimal format.
#' @param method A string giving the method to use. Valid methods are 'basic', 'shin', 'bb',
#' 'wpo', 'or', 'power', 'additive', and 'jsd'.
#' @param normalize Logical. Some of the methods will give small rounding errors. If TRUE (default)
#' a final normalization is applied to make absolutely sure the
#' probabilities sum to 1.
#' @param target_probability Numeric. The value the probabilities should sum to. Default is 1.
#' @param grossmargin Numeric. Must be 0 or greater. See the details.
#' @param shin_method Character. Either 'js' (default) or 'uniroot'. See the details.
#' @param shin_maxiter numeric. Max number of iterations for shin method 'js'.
#' @param uniroot_options list. Option passed on to the uniroot solver, for those methods where it is applicable. See 'details'.
#'
#'
#' @return A named list. The first component is named 'probabilities' and contain a matrix of
#' implied probabilities. The second is the bookmaker margins (aka the overround). The third
#' depends on the method used to compute the probabilities:
#' \itemize{
#'  \item{ zvalues (method = 'shin' and method='bb'): The estimated amount of insider trade.}
#'  \item{ specific_margins (method = 'wpo'): Matrix of the margins applied to each outcome.}
#'  \item{ odds_ratios (method = 'or'): Numeric with the odds ratio that are used to convert true
#'  probabilities to bookmaker probabilities.}
#'  \item{ exponents (method = 'power'): The (inverse) exponents that are used to convert true
#'  probabilities to bookmaker probabilities.}
#'  \item{ distance (method = 'jsd'): The Jensen-Shannon distances that are used to convert true
#'  probabilities to bookmaker probabilities.}
#' }
#'
#' The fourth component 'problematic' is a logical vector called indicating if any probabilities has fallen
#' outside the 0-1 range, or if there were some other problem computing the probabilities.
#'
#'
#' @section References:
#' \itemize{
#'  \item{Hyun Song Shin (1992) Prices of State Contingent Claims with Insider Traders, and the Favourite-Longshot Bias }
#'  \item{Hyun Song Shin (1993) Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims}
#'  \item{Bruno Jullien & Bernard Salanié (1994) Measuring the incidence of insider trading: A comment on Shin.}
#'  \item{John Fingleton & Patrick Waldron (1999) Optimal Determination of Bookmakers' Betting Odds: Theory and Tests.}
#'  \item{Joseph Buchdahl - USING THE WISDOM OF THE CROWD TO FIND VALUE IN A FOOTBALL MATCH BETTING MARKET (https://www.football-data.co.uk/wisdom_of_crowd_bets)}
#'  \item{Keith Cheung (2015) Fixed-odds betting and traditional odds (https://www.sportstradingnetwork.com/article/fixed-odds-betting-traditional-odds/)}
#' }
#'
#' @examples
#'# Two sets of odds for a three-outcome game.
#'my_odds <- rbind(c(4.20, 3.70, 1.95),
#'                 c(2.45, 3.70, 2.90))
#'
#'# Convert to probabilities using Shin's method.
#'converted_odds <- implied_probabilities(my_odds, method='shin')
#'
#'# Look at the probabilities
#'converted_odds$probabilities
#'
#' @export
implied_probabilities <- function(odds, method='basic', normalize=TRUE, target_probability = 1,
                                  grossmargin = 0, shin_method = 'js', shin_maxiter = 1000,
                                  uniroot_options = NULL){

  stopifnot(length(method) == 1,
            tolower(method) %in% c('basic', 'shin', 'bb', 'wpo', 'or', 'power', 'additive', 'jsd'),
            all(odds >= 1, na.rm=TRUE),
            length(target_probability) == 1,
            target_probability > 0,
            grossmargin >= 0,
            shin_method %in% c('js', 'uniroot'),
            length(shin_method) == 1,
            length(shin_maxiter) == 1,
            shin_maxiter > 1,
            is.null(uniroot_options) | is.list(uniroot_options))


  if (method == 'shin' & shin_method == 'uniroot' & grossmargin != 0){
    shin_method <- 'js'
    message('shin_method uniroot does not work when grossmargin is not 0. Method js will be used.')
  }

  if (method == 'shin' & shin_method == 'js' & target_probability != 1){
    shin_method <- 'uniroot'
    grossmargin <- 0
    message('shin_method js does not work when target_probability is not 1. Method uniroot will be used with grossmargin = 0.')
  }


  if (!is.matrix(odds)){

    if ('data.frame' %in% class(odds)){
      odds <- as.matrix(odds)
    } else {
      odds <- matrix(odds, nrow=1,
                     dimnames = list(NULL, names(odds)))
    }
  }

  if (method %in% c('shin', 'or', 'power', 'jsd')){
    uniroot_opts <- default_uniroot_opts(method = method)

    if (is.list(uniroot_options)){
      uniroot_opts <- utils::modifyList(uniroot_opts, uniroot_options)
    }

  }


  # Prepare the list that will be returned.
  out <- vector(mode='list', length=2)
  names(out) <- c('probabilities', 'margin')

  # Some useful quantities
  n_odds <- nrow(odds)
  n_outcomes <- ncol(odds)

  # Inverted odds and margins
  inverted_odds <- 1 / odds
  inverted_odds_sum <- rowSums(inverted_odds)
  out$margin <- inverted_odds_sum - target_probability

  # Missing values
  missing_idx <- apply(odds, MARGIN = 1,
                       FUN = function(x) any(is.na(x)))


  if (any(inverted_odds_sum[!missing_idx] < 1)){
    stop('Some inverse odds sum to less than 1.')
  }

  # Vector to keep track of uniroot problems.
  use_uniroot <- method %in% c('or', 'power', 'jsd') | (method %in% 'shin' & shin_method == 'uniroot')
  if (use_uniroot){
    problematic_uniroot <- logical(n_odds)
    problematic_uniroot_messages <- character(n_odds)
  }

  if (method == 'basic'){
    out$probabilities <- (target_probability * inverted_odds) / inverted_odds_sum

  } else if (method == 'shin'){

    zvalues <- numeric(n_odds) # The proportion of insider trading.
    probs <- matrix(nrow=n_odds, ncol=n_outcomes)

    problematic_shin <- logical(n_odds)

    if (shin_method == 'js'){
      for (ii in 1:n_odds){

        # Skip rows with missing values.
        if (missing_idx[ii] == TRUE){
          next
        }

        # initialize zz at 0
        zz_tmp <- 0

        for (jj in 1:shin_maxiter){
          zz_prev <- zz_tmp

          if (grossmargin != 0){
            zz_tmp <- (sum(sqrt(zz_prev^2 + 4*(1 - zz_prev) * (((inverted_odds[ii,]^2 * (1-grossmargin)))/inverted_odds_sum[ii])))-2) / (n_outcomes - 2)
          } else {
            zz_tmp <- (sum(sqrt(zz_prev^2 + 4*(1 - zz_prev) * (((inverted_odds[ii,])^2)/inverted_odds_sum[ii])))-2) / (n_outcomes - 2)
          }

          if (abs(zz_tmp - zz_prev)  <= .Machine$double.eps^0.25){
            break
          } else if (jj >= shin_maxiter){
            problematic_shin[ii] <- TRUE
          }

          zvalues[ii] <- zz_tmp
          probs[ii,] <- shin_func(zz=zz_tmp, io = inverted_odds[ii,])
        }
      }
    } else if (shin_method == 'uniroot'){
      for (ii in 1:n_odds){

        # Skip rows with missing values.
        if (missing_idx[ii] == TRUE){
          next
        }


        res <- uniroot2(f=shin_solvefor, io=inverted_odds[ii,], trgtprob = target_probability,
                        interval = uniroot_opts$interval, extendInt = uniroot_opts$extendInt,
                        tol = uniroot_opts$tol, maxiter = uniroot_opts$maxiter)

        zvalues[ii] <- res$root
        probs[ii,] <- shin_func(zz=res$root, io = inverted_odds[ii,])

        if (!is.null(res$message)){
          problematic_uniroot[ii] <- TRUE
          problematic_uniroot_messages[ii] <- res$message
        }

      }
    }

    out$probabilities <- probs
    out$zvalues <- zvalues

    if (any(problematic_shin[!missing_idx])){
      warning(sprintf('Could not find z: Did not converge in %d instances. Some results may be unreliable.',
                      sum(problematic_shin)))
    }

  } else if (method == 'bb'){

    # zz <- (((1-grossmargin) * inverted_odds_sum) - 1) / (n_outcomes-1)
    zz <- (((1-grossmargin) * inverted_odds_sum) - target_probability) / (n_outcomes-target_probability)
    probs <- (((1-grossmargin) * inverted_odds) - zz) / (1-zz)

    out$probabilities <- probs
    out$zvalues <- zz


  } else if (method == 'wpo'){
    # Margin Weights Proportional to the Odds.
    # Method from the Wisdom of the Crowds pdf.
    fair_odds <- (n_outcomes * odds) / (n_outcomes - (out$margin * odds))
    out$probabilities <- 1 / fair_odds
    out$specific_margins <- (out$margin * fair_odds) / n_outcomes
  } else if (method == 'or'){

    odds_ratios <- numeric(n_odds)
    probs <- matrix(nrow=n_odds, ncol=n_outcomes)

    for (ii in 1:n_odds){

      # Skip rows with missing values.
      if (missing_idx[ii] == TRUE){
        next
      }

      res <- uniroot2(f=or_solvefor, io=inverted_odds[ii,], trgtprob = target_probability,
                      interval = uniroot_opts$interval, extendInt = uniroot_opts$extendInt,
                      tol = uniroot_opts$tol, maxiter = uniroot_opts$maxiter)

      odds_ratios[ii] <- res$root
      probs[ii,] <- or_func(cc=res$root, io = inverted_odds[ii,])

      if (!is.null(res$message)){
        problematic_uniroot[ii] <- TRUE
        problematic_uniroot_messages[ii] <- res$message
      }

    }

    out$probabilities <- probs
    out$odds_ratios <- odds_ratios

  } else if (method == 'power'){

    probs <- matrix(nrow=n_odds, ncol=n_outcomes)
    exponents <- numeric(n_odds)

    for (ii in 1:n_odds){

      # Skip rows with missing values.
      if (missing_idx[ii] == TRUE){
        next
      }

      res <- uniroot2(f=pwr_solvefor, io=inverted_odds[ii,], trgtprob = target_probability,
                      interval = uniroot_opts$interval, extendInt = uniroot_opts$extendInt,
                       tol = uniroot_opts$tol, maxiter = uniroot_opts$maxiter)
      exponents[ii] <- res$root
      probs[ii,] <- pwr_func(nn=res$root, io = inverted_odds[ii,])

      if (!is.null(res$message)){
        problematic_uniroot[ii] <- TRUE
        problematic_uniroot_messages[ii] <- res$message
      }
    }

    out$probabilities <- probs
    out$exponents <- exponents

  } else if (method == 'additive'){

    probs <- matrix(nrow=n_odds, ncol=n_outcomes)

    for (ii in 1:n_odds){

      # Skip rows with missing values.
      if (missing_idx[ii] == TRUE){
        next
      }

      probs[ii,] <- inverted_odds[ii,] - ((inverted_odds_sum[ii] - target_probability) / n_outcomes)
    }

    out$probabilities <- probs

  } else if (method == 'jsd'){

    probs <- matrix(nrow=n_odds, ncol=n_outcomes)
    jsds <- numeric(n_odds)

    for (ii in 1:n_odds){
      # Skip rows with missing values.
      if (missing_idx[ii] == TRUE){
        next
      }


      res <- uniroot2(f=jsd_solvefor, io=inverted_odds[ii,], trgtprob = target_probability,
                      interval = uniroot_opts$interval, extendInt = uniroot_opts$extendInt,
                      tol = uniroot_opts$tol, maxiter = uniroot_opts$maxiter)
      jsds[ii] <- res$root
      probs[ii,] <- jsd_func(jsd=res$root, io = inverted_odds[ii,])

      if (!is.null(res$message)){
        problematic_uniroot[ii] <- TRUE
        problematic_uniroot_messages[ii] <- res$message
      }
    }

    out$probabilities <- probs
    out$distance <- jsds

  }

  ## do a final normalization to make sure the probabilities
  ## sum to 1 without rounding errors.
  if (normalize){
    out$probabilities <- (target_probability * out$probabilities) / rowSums(out$probabilities)
  }

  # Make sure the matrix of implied probabilities has column names.
  if (!is.null(colnames(odds))){
    colnames(out$probabilities) <- colnames(odds)
  }

  # check if there are any probabilities outside the 0-1 range.
  problematic <- apply(out$probabilities, MARGIN = 1, FUN=function(x){any(x > 1 | x < 0)})
  problematic[is.na(problematic)] <- TRUE
  problematic[missing_idx] <- NA

  if (any(problematic, na.rm=TRUE)){
    warning(sprintf('Probabilities outside the 0-1 range produced at %d instances.\n',
                    sum(problematic)))
  }

  # Give warnings for problems when uniroot was used.
  if (use_uniroot){

    if (any(problematic_uniroot)){
      problematic[problematic_uniroot] <- TRUE

      problematic_uniroot_messages <- unique(problematic_uniroot_messages)
      for (ww in 1:length(problematic_uniroot_messages)){
        warning(problematic_uniroot_messages[ww])
      }
    }
  }


  if (method == 'shin'){
    problematic <- problematic | problematic_shin
  }

  if (method %in% c('shin', 'bb')){
    negative_z <- out$zvalues < 0
    if (any(negative_z[!missing_idx])){
      warning(sprintf('z estimated to be negative: Some results may be unreliable.',
                      negative_z))
    }
  }

  out$problematic <- problematic


  return(out)
}


#########################################################
# Internal functions used to transform probabilities
# and be used with uniroot.
#########################################################


default_uniroot_opts <- function(method){
  opts <- list(extendInt = 'yes',
               maxiter = 1000,
               tol = .Machine$double.eps^0.25)

  if (method == 'shin'){
    opts$interval <- c(0, 0.4)
  } else if (method == 'or'){
    opts$interval <- c(0.95, 5)
  } else if (method == 'power'){
    opts$interval <- c(0.0001, 1)
  } else if (method == 'jsd'){
    opts$interval <- c(0.0000001, 0.1)
    opts$tol = 0.000001
  }

  return(opts)

}



# Wrapper around uniroot, but returns a list with NA results
# if the solver fails.
uniroot2 <- function(f, interval, ...,
                     extendInt = 'no', tol = .Machine$double.eps^0.25, maxiter = 1000){


  res <- tryCatch({
    stats::uniroot(f = f, ..., interval = interval, extendInt = extendInt, tol = tol, maxiter = maxiter)
  }, error = function(e){
    list(root = NA, message = as.character(e))
  })

  return(res)

}




# Calculate the probabilities using Shin's formula, for a given value of z.
# io = inverted odds.
shin_func <- function(zz, io){
  bb <- sum(io)
  (sqrt(zz^2 + 4*(1 - zz) * (((io)^2)/bb)) - zz) / (2*(1-zz))
}


# Calculate the probabilities using the odds ratio method,
# for a given value of the odds ratio cc.
# io = inverted odds.
or_func <- function(cc, io){
  io / (cc + io - (cc*io))
}


# the condition that the sum of the probabilites must sum to 1.
# Used with uniroot.
shin_solvefor <- function(zz, io, trgtprob){
  tmp <- shin_func(zz, io)
  sum(tmp) - trgtprob # 0 when the condition is satisfied.
}

# The condition that the sum of the probabilites must sum to 1.
# This function calulates the true probability, given bookmaker
# probabilites xx, and the odds ratio cc.
or_solvefor <- function(cc, io, trgtprob){
  tmp <- or_func(cc, io)
  sum(tmp) - trgtprob
}

# power function.
pwr_func <- function(nn, io){
  io^(1/nn)
}

# The condition that the sum of the probabilites must sum to 1.
# This function calulates the true probability, given bookmaker
# probabilites xx, and the inverse exponent. nn.
pwr_solvefor <- function(nn, io, trgtprob){
  tmp <- pwr_func(nn, io)
  sum(tmp) - trgtprob
}

# Simple discrete KL-divergence.
kld <- function(x, y){
  sum(x * log(x/y))
}

# The binomial symmetric Jensen–Shannon distance
# assuming p and io have length 1.
binom_jsd <- function(p, io){

  pvec <- c(p, 1-p)
  iovec <- c(io, 1-io)

  mm <- (pvec + iovec) / 2
  sqrt((kld(pvec, mm)/2) + (kld(iovec, mm)/2))

}

# Find the probabilties for a given JS distance and inverted odds.
jsd_func <- function(jsd, io){

  # The function to be used by uniroot to find p from kl and io.
  tosolve <- function(p, io, jsd){
    binom_jsd(p=p, io = io) - jsd
  }

  pp <- numeric(length(io))
  for (ii in 1:length(io)){
    # Intervall from approx 0 to io, implying
    # That the underlying probability i less than the
    # inverse odds.
    pp[ii] <- stats::uniroot(f = tosolve,
                      interval = c(0.000001, io[ii]),
                      extendInt = 'yes',
                      io = io[ii], jsd = jsd)$root

  }
  return(pp)
}

# Calculate the probabilities using the Jensen-Shannon distance method,
# for a given value of the odds ratio cc.
# io = inverted odds.
jsd_solvefor <- function(jsd, io, trgtprob){
  sum(jsd_func(jsd=jsd, io = io)) - trgtprob
}

Try the implied package in your browser

Any scripts or data that you put into this service are public.

implied documentation built on July 9, 2023, 6:51 p.m.