In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

defaultW <- getOption("warn") 
options(warn = -1) 
library("hyper2",quietly=TRUE)
library("magrittr")
options(warn = defaultW)

knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

most of the time is taken calculating the contour graph right at the end; takes about an hour to run without cache for n=10

To cite the hyper2 package in publications, please use @hankin2017_rmd.

Objects of class \code{hyper3} are a generalization of \code{hyper2} objects that allow the brackets to contain weighted probabilities. Likelihood functions are defined on non-negative $p_1,\ldots, p_n$ subject to the unit-sum constraint $\sum p_i=1$. Given known weights $w^i_j$ with $1\leq i\leq j$ we have

$$\mathcal{L}\left(p_1,\ldots, p_n\right)=\prod_j\left(\sum_{i=1}^n w^i_jp_i\right)^{n_j}.$$

As a motivating example, suppose two teams (players) with Bradley-Terry strengths $p_1,p_2$ play football where we quantify the home-ground advantage with a term $\lambda$. If $p_1$ plays at home $a+b$ times with $a$ wins and $b$ losses, and plays away [so $p_2$ is at home] $c+d$ times with $c$ wins and $d$ losses, then a sensible likelihood function might be

$$\mathcal{L}(p_1,p_2,\lambda;A,B,C,D)= \underbrace{ \overbrace{\left(\frac{\lambda p_1}{\lambda p_1 + p_2}\right)^{A}}^{\mbox{$A$ home wins for $p_1$}}\quad \overbrace{\left(\frac{p_2 }{\lambda p_1 + p_2}\right)^{B}}^{\mbox{$B$ away wins for $p_2$}} }\mbox{$p_1$ plays at home to $p_2$ $A+B$ times}\quad \underbrace{ \overbrace{\left(\frac{\lambda p_2}{p_1 + \lambda p_2}\right)^{C}}^{\mbox{$C$ home wins for $p_2$}}\quad \overbrace{\left(\frac{ p_1}{p_1 + \lambda p_2}\right)^{D}}^{\mbox{$D$ away wins for $p_1$}} }\mbox{$p_2$ plays at home to $p_1$ $C+D$ times} $$

where we understand that $p_1+p_2=1$, $p_1,p_2,\lambda\geqslant 0$. Elementary techniques allow us to identify a maximum and we find

$$ \hat{p_1}=\frac{\sqrt{AD}}{\sqrt{AD}+\sqrt{BC}}\qquad \hat{p_2}=\frac{\sqrt{BC}}{\sqrt{AD}+\sqrt{BC}}\qquad \hat{\lambda}=\sqrt{\frac{AC}{BD}} $$

Note that the evaluate is unchanged if $A,B,C,D$ are all multiplied by a positive constant.

Aside: `hyper2` analysis of the same scoreline

Now the likelihood function would be

$$\mathcal{L}(p_1,p_2,\lambda;A,B,C,D)= \underbrace{ \overbrace{\left(\frac{p_1+M}{p_1 + p_2+M}\right)^{A}}^{\mbox{$A$ home wins for $p_1$}}\quad \overbrace{\left(\frac{p_2 }{p_1 + p_2+M}\right)^{B}}^{\mbox{$B$ away wins for $p_2$}} }\mbox{$p_1$ plays at home to $p_2$ $A+B$ times}\quad \underbrace{ \overbrace{\left(\frac{p_2+M}{p_1 + p_2+M}\right)^{C}}^{\mbox{$C$ home wins for $p_2$}}\quad \overbrace{\left(\frac{p_1 }{p_1 + p_2+M}\right)^{D}}^{\mbox{$D$ away wins for $p_1$}} }\mbox{$p_2$ plays at home to $p_1$ $C+D$ times} $$

The maximum would be

$$ \hat{p_1}=\frac{D}{C+D}\qquad \hat{p_2}=\frac{B}{A+B}\qquad \hat{M}=\frac{AC-BD}{(A+B)(C+D)} $$

although it is not clear to me what the constrained optimal point [that is, requiring $M\geq 0$] is.

Numerical analysis

A <- 9
B <- 5
C <- 7
D <- 3

(p1hat <- sqrt(A*D)/(sqrt(A*D) + sqrt(B*C)))
(p2hat <- sqrt(B*C)/(sqrt(A*D) + sqrt(B*C)))

(lam_hat <- sqrt((A*C)/(B*D)))

$$ \hat{p_1}=\frac{\sqrt{21}}{\sqrt{21} + \sqrt{35}}\simeq 0.47\qquad \hat{p_2}=\frac{\sqrt{63}}{\sqrt{21} + \sqrt{35}}\simeq 0.53\qquad \hat{\lambda}=\sqrt{\frac{AC}{BD}}=2.05. $$

Moving to the monster formulation we have

$$ \hat{p_1}=\frac{3}{7+3} = 0.3\qquad \hat{p_2}=\frac{5}{5+9}\simeq 0.36\qquad \hat{M}=\frac{63-21}{14\cdot 10} = 0.3 $$

Keeping $A=9,B=5, C=7,D=3$, and $\lambda=2.05$ [assumed for the moment to be known, estimation techniques are discussed later], we can represent this information in a standard format:

library("hyper2",quietly=TRUE)
M <- matrix(c(NA, A+B*1i ,C+D*1i, NA),2,2,byrow=TRUE)
teams <- c("p1","p2")
dimnames(M) <- list("@home" = teams,"@away"=teams)
dimnames(M) <- list("@home" = teams,"@away"=teams)
print(M)

Above, object M has real parts being home wins and imaginary parts being away wins. Appropriate package idiom to specify a likelihood function might be to use bespoke function home_away3():

(H <- home_away3(M,lambda=2.05))

Keeping $A=9,B=5, C=7,D=3$, and $\lambda=2.05$ [assumed for the moment to be known, estimation techniques are discussed later], we may estimate $p_1$ and $p_2$ using maximum likelihood, maxp() in package idiom:

H
maxp(H)

Further, we can test whether the players are in fact of equal strength:

equalp.test(H)

Showing that we fail to reject the null that $p_1=p_2$, at least with this value of $\lambda$. Observe that a naive analysis would have $p_1$ winning $w=A+C=r A+C$ games and losing $l=B+D=r B+D$ games, from which we would obtain $\hat{p_1}=r A+C/r A+B+C+D$; a likelihood ratio for $H_0\colon p_1=0.5$ would be

$$\frac { {{w+l}\choose {w\, l}}\left(\frac{w}{w+l}\right)^w\left(\frac{l}{w+l}\right)^l }{ {{w+l}\choose {w\, l}}\left(\frac{1}{2}\right)^w\left(\frac{1}{2}\right)^l }=\frac{2^{w+l}w^wl^l}{(w+l)^{w+l}}\simeq 1.94, $$

not significant.

Now, how to estimate $\lambda$?

f <- function(lambda){maxp(home_away3(M,lambda=lambda),give=TRUE)$value}
f(2)
f(2.1)
lam <- seq(from=1,to=10,len=17)
Supp <- sapply(lam,f)

plot(log(lam),Supp-max(Supp),type="b")

Or even

op <- optimize(f,interval=c(2,3),maximum=TRUE)
op

We can proceed in two ways. Firstly we can use $\hat{\lambda}$ directly:

equalp.test(home_away3(M,lambda=op$maximum))

The flaw in this analysis is that it is conditional on the estimated value of $\lambda$ [which is not too bad, IMO]. Secondly we can perform a fully 2D analysis, which allows for the fact that the evaluate might have a different value of $\lambda$ from the $\hat{\lambda}$:

probs <- seq(from=0.1,to=0.8,len=2)
jj <- as.matrix(expand.grid(lam,probs))
S <- rep(NA,nrow(jj))
for(i in seq_len(nrow(jj))){
    lambda <- jj[i,1]
    p1 <- jj[i,2]
    p <- c(p1,1-p1)
    names(p) <- c("p1","p2")
   S[i] <- loglik(p=p,H=home_away3(M,lambda=lambda),log=TRUE)
}
S <- matrix(S,length(lam),length(probs))

contour(lam,probs,S,levels=seq(from=-20,to=-10,by=0.5),xlab="lambda",ylab="p1")
abline(h=0.5)

Ternary representation

library(Ternary)

par(mar = rep(0.2, 4))
TernaryPlot(alab = "p1", blab = "p2", clab = "M")

f <- function(p1,p2,M) {
pmax(-20,log(M+p1)*A + log(p2)*B - log(M+p1+p2)*(A+B+C+D) + log(p2+M)*C + log(p1)*D )
}
FunctionToContour <- function(a,b,c){f(p1=a, p2=b, M=c)}

# Compute and plot colour tiles
values <- TernaryPointValues(FunctionToContour, resolution = 24L)
ColourTernary(values)

# Add contour lines
TernaryContour(FunctionToContour, resolution = 36L)

A slightly more complicated example:

home_games_won <- matrix(c(
    NA, 16, 12, 11,
    19, NA, 19, 16,
    17, 12, NA, 11,
    11, 12, 12, NA),
    nrow=4,ncol=4,byrow=TRUE)

away_games_won <- matrix(c(
    NA, 05, 02, 02,
     9, NA, 10, 02,
     3, 04, NA, 07,
     8, 06, 04, NA),
    nrow=4,ncol=4,byrow=TRUE)

teams <- LETTERS[1:4]
dimnames(home_games_won) <- list("@home" = teams,"@away"=teams)
dimnames(away_games_won) <- list("@home" = teams,"@away"=teams)

home_games_won
away_games_won

Thus home_games_won[1,2] == 16 means A played at home against B and won 16 times; home_games_won[2,1] == 19 means B played at home against A and won 19 times. Also, away_games_won[1,2] == 5 means A played away against B and won 5 times, and away_games_won[2,1] == 2 means B played away against A and won 2 times. Alternatively, A played B 16+19+5+2=45 times; A played at home 16+2=18 times and won 16 times and lost 2 times; and B played at home 19+5=24 times and won 19 times and lost 5 times. Alternatively we may use complex numbers to represent the same dataset:

home_games_won + 1i*away_games_won

We will create a hyper2 object with the teams and home ground advantage term:

H <- home_away(home_games_won,away_games_won)
H
options("use_alabama" = FALSE) # to avoid the stupid wmmin bug
specificp.gt.test(H,"home",0)
options("use_alabama" = TRUE) # to avoid the stupid wmmin bug

We see strong evidence to support the contention that home advantage is real. Further, we may test the hypothesis that all the teams have the same strength, after accounting for the home team advantage:

samep.test(H,teams)

Above, we see no evidence for a difference in team strengths. Visually:

(mt <- maxp(H))

A profile likelihood diagram:

home_strength <- seq(from=0.15,to=0.45,length=13)
plot(home_strength,profsupp(H,"home",home_strength),type="b",pch=16,
     xlab="home strength",ylab="support",main="profile likelihood for home advantage")
abline(h=c(0,-2))

The figure give a support interval between about 0.3 and 0.5 for home ground advantage.

Example in Agresti

@agresti2002 (p438) considers a dataset of seven baseball teams who play one another repeatedly. Each game is played at the home ground of one team, the other playing away. The dataset is as follows:

baseball_table <- matrix(c(
NA,   4+3i, 4+2i, 4+3i, 6+1i, 4+2i, 6+0i,
3+3i, NA  , 4+2i, 4+3i, 6+0i, 6+1i, 4+3i,
2+5i, 4+3i, NA  , 2+4i, 4+3i, 4+2i, 6+0i,
3+3i, 5+1i, 2+5i, NA  , 4+3i, 4+2i, 6+1i,
5+1i, 2+5i, 3+3i, 4+2i, NA  , 5+2i, 6+0i,
2+5i, 3+3i, 3+4i, 4+3i, 4+2i, NA  , 2+4i,
2+5i, 1+5i, 1+6i, 2+4i, 1+6i, 3+4i, NA),7,7)
teams <- c("Milwaukee", "Detroit", "Toronto", "New York", "Boston","Cleveland","Baltimore")
rownames(baseball_table) <- teams
colnames(baseball_table) <- teams
M <- baseball_table # saves typing
M # for convenience

Above, we represent home team wins with the real part of the matrix, the imaginary component being away wins. Now process it:

baseball <- home_away(M)
baseball

Maximum likelihood:

baseball_maxp <- maxp(baseball)
baseball_maxp

Visually:

dotchart(baseball_maxp,pch=16)
pie(baseball_maxp)

Above, we see that the teams are all of roughly equal strength, and the home advantage is small. However, we may easily assess the null that the home strength is zero:

specificp.gt.test(baseball,"home",0)

home_strength <- seq(from=0.005,to=0.1,length=20)
plot(home_strength,profsupp(baseball,"home",home_strength),type="b",pch=16,
     xlab="home strength",ylab="support",main="profile likelihood for home advantage")
abline(h=c(0,-2))

Further, we may test the hypothesis that the baseball teams have the same strength:

samep.test(baseball,teams)

Above, we see no evidence for the teams having differential strengths.

As an interesting aside, suppose we have some additional data from games in which (for some reason), the home advantage was not operating.

M2 <- matrix(c(0,1,2,0),2,2)
dimnames(M2) <- list(winner=teams[1:2],loser=teams[1:2])
M2

Thus we have additional data from three games in which Milwaukee won twice and lost once. This additional information may easily be incorporated into our likelihood function, at least on the assumption of independence:

baseball <- baseball + pairwise(M2)

We may assess the difference that the new information makes:

ordertransplot(baseball_maxp[1:7],maxp(baseball)[1:7],xlab="old",ylab="new",plotlims=c(0.01,0.16))

Above, we see very little difference in the two evaluates, except for Detroit and Milwaukee, as expected.

Some further thoughts on the home ground monster and Agresti's analysis.

Considering the simplest case of two teams, a and b, with results:

$a$ at home, $b$ away, $a,b$ wins with probability $\frac{\lambda a,b}{\lambda a+b}$ (Agresti), or $\frac{a+H,b}{a+b+H}$ (me).
$a$ away, $b$ at home, $a,b$ wins with probability $\frac{a,\lambda b}{\lambda a+b}$ (Agresti), or $\frac{a,b+H}{a+b+H}$ (me).

Matching the probabilities for $a$ at home gives us $H=a(\lambda-1)$ and matching the probabilities for $a$ away gives $H=b(\lambda-1)$. These cannot both be true unless $a=b$ or $\lambda=1$. But we can, in principle, test which one is a better model.

hyper3 idiom

We may apply hyper3 idiom to the dataset, using the likelihood function of Davidson and Beaver (1977). First a formal test:

baseball_table
f <- function(l){ maxp(home_away3(baseball_table,l),give=TRUE,n=1)$likes}
o <- optimize(f,c(0.8,1.6),maximum=TRUE)
o
lambda_baseball <- o$maximum
maxL <- f(lambda_baseball)
W <- maxL - f(1)    # f(1) being the likelihood of the null
W
pchisq(2*W,df=1,lower.tail=FALSE)

The pvalue is significant. Now plot a profile likelihood (support) curve:

jj1  <- seq(from=log(lambda_baseball),to=log(1.9),len=7)
jj2  <- seq(from=log(lambda_baseball),to=log(0.8),len=7)
l <- sort(unique(exp(c(jj1,jj2))))
lam  <- sort(c(l,lambda_baseball))
L <- sapply(lam,f) - maxL

plot(log(lam),L,type="b")
abline(h=c(0,-2))
abline(v=0)

plot(lam,L,type="b")
abline(h=c(0,-2))

Analysis of Work by Davidson and Beaver

DB <- # Davidson and Beaver
matrix(c(
    NA, 32+18i, 36+14i, 47+03i, 47+03i,
14+36i,     NA, 34+16i, 43+07i, 46+04i,
06+44i, 14+36i,     NA, 34+16i, 40+10i,
02+48i, 07+43i, 12+38i,     NA, 28+22i,
01+49i, 02+48i, 05+45i, 10+40i,     NA
),5,5,byrow=TRUE
)
jj <- c("90g","95g","100g","105g","110g")
dimnames(DB) <- list(`@home` = jj, `@away` = jj)
DB

f <- function(l){ maxp(home_away3(DB,l),give=TRUE,n=1)$likes}
lam <- seq(from=0.6,to=1,len=19)
date()
loglike <- sapply(lam,f)
date()

plot(lam,loglike-max(loglike),type='b')
abline(h=c(0,-2))
abline(v=1)

Creating a home/away results table from scratch

Function home_away_table() takes a data frame of results and produces a table amenable to analysis by home_away() or home_away3().

a <- read.table("celtic_rangers_livingston_falkirk.txt", header=TRUE)
head(a)
teams <- c("Rangers","Celtic","Falkirk","Livingston")
(aF <- home_away_table(a, give=FALSE,teams = teams))
(aT <- home_away_table(a, give=TRUE, teams = teams))

(We will ignore draws for the moment, and consider only won games)

(H <- home_away(aF))
pnames(H) <- c(teams,"home")
H
(mH <- maxp(H))

samep.test(H,c("Rangers","Celtic"))


```r
specificp.gt.test(H,"home",0)

home_away3(aF,lambda=1.88)
maxp(home_away3(aF,lambda=1.88),give=1)
maxp(home_away3(aF,lambda=1.11),give=1)

f <- function(l){ maxp(home_away3(aF,l),give=TRUE,n=1)$likes}
lam <- seq(from=1.4,to=2.3,len=10)
date()
loglike <- sapply(lam,f)
date()

(o <- optimize(f,c(1.1,4.6),maximum=TRUE))

o
RCLF3_lambda_max <- o$maximum
maxL <- f(RCLF3_lambda_max)

W <- maxL - f(1)    # f(1) being the likelihood of the null
W
pchisq(2*W,df=1,lower.tail=FALSE)

(H <- home_away3(home_away_table(a,give=FALSE),RCLF3_lambda_max))  # o$maximum ~= 1.353
maxp(H,give=1)
RCLF3 <- H

plot(lam,loglike-max(loglike),type='b')
abline(h=c(0,-2))
abline(v=1)
grid()

(mH2 <- round(mH*1000))

RCLF3_lambda_max <- o$maximum
RCLF3_maxp <- maxp(home_away3(home_away_table(a,give=FALSE),RCLF3_lambda_max),give=FALSE)
round(RCLF3_maxp * 1000)

Rangers vs Falkirk {-}

(Rangers at home)

hyper2: P(Rangers;Falkirk) = $\frac{448+67; 56}{448 + 56 + 67}\simeq (0.902; 0.098)$
hyper3: P(Rangers;Falkirk) = $\frac{1.77448; 56}{1.77448+56}\simeq(0.934; 0.066)$

actual scoreline: 69(R) / 2(F) or about $(0.972; 0.028)$. Expectation would be (64;7) (H2) or (66.3;4.7) (H3)

(Rangers away)

hyper2: P(Rangers;Falkirk) = $\frac{448; 56+67}{448 + 56 + 67}\simeq (0.78;0.22)$
hyper3: P(Rangers;Falkirk) = $\frac{448; 1.7756}{448 + 1.7756}\simeq(0.82; 0.18)$

actual scoreline: 57(R) /17(F) - or about $(0.77; 0.23)$

Rangers vs Livingston {-}

(Rangers at home)

hyper2: P(Rangers;Livingston) = $\frac{448+67; 39}{448 + 39 + 67}\simeq (0.930; 0.070)$
hyper3: P(Rangers;Livingston) = $\frac{1.77448; 39}{1.77448+39}\simeq(0.953; 0.047)$

actual scoreline: 15(R) / 0(L) or $(1; 0)$

(Rangers away)

hyper2: P(Rangers;Livingston) = $\frac{448; 39+67}{448 + 39 + 67}\simeq (0.81;0.19)$
hyper3: P(Rangers;Livingston) = $\frac{448; 1.7739}{448 + 1.7739}\simeq(0.86; 0.14)$

actual scoreline: 12(R) /2(L) - or about $(0.86; 0.14)$

Celtic vs Falkirk {-}

(Celtic at home)

hyper2: P(Celtic;Falkirk) = $\frac{390+67; 56}{390 + 56 + 67}\simeq (0.89; 0.11)$
hyper3: P(Celtic;Falkirk) = $\frac{1.77412; 74}{1.77412+74}\simeq(0.92; 0.08)$

actual scoreline: 61-8 or about $(0.88; 0.12)$

(Celtic away)

hyper2: P(Celtic;Falkirk) = $\frac{390; 56+67}{390 + 56 + 67}\simeq (0.76;0.24)$
hyper3: P(Celtic;Falkirk) = $\frac{412; 1.7774}{412 + 1.7774}\simeq(0.76; 0.24)$

actual scoreline: 43-17 or about $(0.72; 0.28)$

Livingston vs Falkirk {-}

(Livingston at home)

hyper2: P(Livingston;Falkirk) = $\frac{39+67; 56}{39 + 67 + 56}\simeq (0.65; 0.35)$
hyper3: P(Livingston;Falkirk) = $\frac{1.7742; 74}{1.7742+74}\simeq(0.50; 0.50)$

actual scoreline: 3-9 or about $(0.25; 0.75)$

(Livingston away)

hyper2: P(Livingston;Falkirk) = $\frac{39; 56+67}{39 + 56 + 67}\simeq (0.24;0.76)$
hyper3: P(Livingston;Falkirk) = $\frac{42; 1.7774}{42 + 1.7774}\simeq(0.24; 0.76)$

actual scoreline: 6-8 or about $(0.43; 0.57)$

Expectations

First, hyper2:

mH
mH["Rangers"]
mH["home"]
RCLF3_maxp
lambda <- RCLF3_lambda_max
h2_exp <- matrix(0,4,4)
rownames(h2_exp) <- teams
colnames(h2_exp) <- teams
h3_exp <- h2_exp
n_matches <- Re(aF) + Im(aF)
for(hometeam in teams){
    for(awayteam in teams){

        h2_exp[hometeam,awayteam] <- (
            n_matches[hometeam,awayteam] * (mH[hometeam] + mH["home"])/
            (mH[hometeam] + mH[awayteam] + mH["home"])
        )
        h2_exp[hometeam,awayteam] %<>%
            add(1i*
                n_matches[hometeam,awayteam] * (mH[awayteam])/
                (mH[hometeam] + mH[awayteam] + mH["home"])
                )

        h3_exp[hometeam,awayteam] <- (
            n_matches[hometeam,awayteam] * (lambda* RCLF3_maxp[hometeam])/
            (lambda* RCLF3_maxp[hometeam] + RCLF3_maxp[awayteam])
        )

        h3_exp[hometeam,awayteam] %<>%
            add(1i*
                n_matches[hometeam,awayteam] * (RCLF3_maxp[awayteam])/
                (lambda* RCLF3_maxp[hometeam] + RCLF3_maxp[awayteam])
        )
    }
}
aF
round(h2_exp,1)
round(h3_exp,1)

makev <- function(M){
    out <- c(Re(M),Im(M))
    out[!is.na(out)]
}


(obs <- makev(aF))
(exp2 <- makev(h2_exp))
(exp3 <- makev(h3_exp))

plot((obs-exp2)^2/exp2)
plot((obs-exp3)^2/exp3)


chisq2 <- sum((obs-exp2)^2/exp2)
chisq3 <- sum((obs-exp3)^2/exp3)

pchisq(chisq2,df=12-4,lower.tail=FALSE)
pchisq(chisq3,df=12-4,lower.tail=FALSE)

par(pty='s')
plot(obs,exp2,log="xy",asp=1,xlim=c(1,100),ylim=c(1,100))
abline(0,1)
plot(obs,exp3,log="xy",asp=1,xlim=c(1,100),ylim=c(1,100))
abline(0,1)

expectation2 <- function(probs, results){
    teams <- names(probs[names(probs) != "home"])
    h2_exp <- matrix(0,4,4)
    rownames(h2_exp) <- teams
    colnames(h2_exp) <- teams
    n_matches <- Re(aF) + Im(aF)
    for(hometeam in teams){
        for(awayteam in teams){

            h2_exp[hometeam,awayteam] <- (
                n_matches[hometeam,awayteam] * (mH[hometeam] + mH["home"])/
                (mH[hometeam] + mH[awayteam] + mH["home"])
            )

            h2_exp[hometeam,awayteam] %<>%
                add(1i*
                    n_matches[hometeam,awayteam] * (mH[awayteam])/
                    (mH[hometeam] + mH[awayteam] + mH["home"])
                    )
        }
    }
    return(h2_exp)
}
round(expectation2(mH,aF),1)

expectation3 <- function(probs, results, lambda){
    teams <- names(probs)
    h3_exp <- matrix(0,4,4)
    rownames(h3_exp) <- teams
    colnames(h3_exp) <- teams
    n_matches <- Re(aF) + Im(aF)
    for(hometeam in teams){
        for(awayteam in teams){

            h3_exp[hometeam,awayteam] <- (
                n_matches[hometeam,awayteam] * (lambda* RCLF3_maxp[hometeam])/
                (lambda* RCLF3_maxp[hometeam] + RCLF3_maxp[awayteam])
            )

            h3_exp[hometeam,awayteam] %<>%
                add(1i*
                    n_matches[hometeam,awayteam] * (RCLF3_maxp[awayteam])/
                    (lambda* RCLF3_maxp[hometeam] + RCLF3_maxp[awayteam])
                    )
        }
    }
    return(h3_exp)
}

round(expectation3(RCLF3_maxp[c("Rangers","Celtic","Falkirk","Livingston")],aF,RCLF3_lambda_max),1)

Bring back draws

aT
home_draw_away3(aT,lambda=1.888,D=0.3)
f <- function(v){
    H <- home_draw_away3(aT, lambda = v[1], D=v[2])
    maxp(H,give=1)$`Log-likelihood`
}
n <- 2
lambda_try <- seq(from=1,to=3,len=n)
D_try <- seq(from=0.1,to=0.6,len=n)
V <- as.matrix(expand.grid(lambda_try,D_try))
LL <- apply(V,1,f)

LL <- matrix(LL,n,n) - max(LL)
LL <- pmax(LL,-10)
contour(lambda_try, D_try, LL,xlab="lambda",ylab="D")

Now find the evaluate:

maxv <- optim(par=c(3,0.7), fn=f, control=list(fnscale = -1))

maxv

MLparams <- maxv$par
mH <- maxp(home_draw_away3(aT, lambda = MLparams[1], D=MLparams[2]),give=1)

mH

Appendix

Here is a transcript of a mathematica session which establishes the estimates above. First the hyper2 likelihood:

rhankin@rhrpi4:~ $ wolfram
Mathematica 12.2.0 Kernel for Linux ARM (32-bit)
Copyright 1988-2021 Wolfram Research, Inc.


In[1]:= L = Log[M+p1]*A +Log[p2]*B- Log[M+p1+p2]*(A+B+C+D) + Log[p2+M]*C + Log[p1]*D 

Out[1]= D Log[p1] + A Log[M + p1] + B Log[p2] + C Log[M + p2] - (A + B + C + D) Log[M + p1 + p2] 

In[2]:= LL = L /. {p2 -> 1-p1-M} 

Out[2]= C Log[1 - p1] + B Log[1 - M - p1] + D Log[p1] + A Log[M + p1] 

In[3]:= dp = D[LL,p1]//FullSimplify 

            C      D         B          A
Out[3]=  ------- + -- + ----------- + ------
         -1 + p1   p1   -1 + M + p1   M + p1

In[4]:= dM = D[LL,M]//FullSimplify 

              B          A
Out[4]= ----------- + ------
         -1 + M + p1   M + p1

In[5]:= Solve[{dp==0,dM==0},{p1,M}] 

                   D            A C - B D
Out[5]=  {{p1 -> -----, M -> ---------------}}
                 C + D       (A + B) (C + D)


In[6]:= p2 = 1-p1-M /. {%} //FullSimplify 

             B
Out[6]=  {{-----}}
           A + B

Now the hyper3 likelihood:

rhankin@rhrpi4:~ $ wolfram
Mathematica 12.2.0 Kernel for Linux ARM (32-bit)
Copyright 1988-2021 Wolfram Research, Inc.

In[1]:= L = A*Log[lambda*p1] + B*Log[p2] -(A+B)*Log[lambda*p1 + p2] + C*Log[lambda*p2] + D*Log[p1] -(C+D)*Log[p1 + lambda*p2] 

Out[1]= D Log[p1] + A Log[lambda p1] + B Log[p2] + C Log[lambda p2] - (A + B) Log[lambda p1 + p2] - (C + D) Log[p1 + lambda p2]

In[2]:= LL = L /. {p2 -> 1-p1} 

Out[2]= B Log[1 - p1] + C Log[lambda (1 - p1)] + D Log[p1] + A Log[lambda p1] - (C + D) Log[lambda (1 - p1) + p1] - 

>    (A + B) Log[1 - p1 + lambda p1]

In[3]:= dp1 = D[LL,p1] //FullSimplify 

         B + C    A + D   (A + B) (-1 + lambda)    (C + D) (-1 + lambda)
Out[3]= ------- + ----- - --------------------- + -----------------------
        -1 + p1    p1     1 + (-1 + lambda) p1    lambda + p1 - lambda p1


In[4]:= dlam = D[LL,lambda]//FullSimplify                                                                                                     

          A        C           (A + B) p1           (C + D) (-1 + p1)
Out[4]= ------ + ------ - -------------------- + -----------------------
        lambda   lambda   1 + (-1 + lambda) p1   lambda + p1 - lambda p1


In[5]:= Solve[{dp1==0,dlam==0},{p1,lambda}]//FullSimplify 

[snip]


In[6]:= %[[1]] 

                        1                     Sqrt[A] Sqrt[C]
Out[6]= {p1 -> -------------------, lambda -> ---------------}
                   Sqrt[B] Sqrt[C]            Sqrt[B] Sqrt[D]
               1 + ---------------
                   Sqrt[A] Sqrt[D]

References

R. R. Davidson and R. J. Beaver 1977. "On extending the Bradley-Terry model to incorporate within-pair order effects" Biometrics, 33:693--702

Package dataset {-}

Following lines create RCLF.rda, residing in the data/ directory of the package.

RCLF3_table <- home_away_table(a, give=TRUE, teams = teams)

save(RCLF3_table,  RCLF3, RCLF3_maxp, RCLF3_lambda_max, file="RCLF.rda")

RobinHankin/hyper2 documentation built on April 13, 2025, 9:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RobinHankin/hyper2
The Hyperdirichlet Distribution, Mark 2

In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

Aside: `hyper2` analysis of the same scoreline

Numerical analysis

Ternary representation

Example in Agresti

Some further thoughts on the home ground monster and Agresti's analysis.

hyper3 idiom

Analysis of Work by Davidson and Beaver

Creating a home/away results table from scratch

Rangers vs Falkirk {-}

Rangers vs Livingston {-}

Celtic vs Falkirk {-}

Livingston vs Falkirk {-}

Expectations

Bring back draws

Appendix

References

Package dataset {-}

R Package Documentation

Browse R Packages

We want your feedback!

RobinHankin/hyper2 The Hyperdirichlet Distribution, Mark 2

In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

Aside: hyper2 analysis of the same scoreline

Numerical analysis

Ternary representation

Example in Agresti

Some further thoughts on the home ground monster and Agresti's analysis.

hyper3 idiom

Analysis of Work by Davidson and Beaver

Creating a home/away results table from scratch

Rangers vs Falkirk {-}

Rangers vs Livingston {-}

Celtic vs Falkirk {-}

Livingston vs Falkirk {-}

Expectations

Bring back draws

Appendix

References

Package dataset {-}

R Package Documentation

Browse R Packages

We want your feedback!

RobinHankin/hyper2
The Hyperdirichlet Distribution, Mark 2

Aside: `hyper2` analysis of the same scoreline