In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

set.seed(1)
knitr::opts_chunk$set(echo = TRUE)
library("hyper2")

knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

To cite the hyper2 package in publications, please use @hankin2017_rmd; takes about an hour to process without cache. Here I consider pairwise comparisons with a home-ground advantage and wish to compare two likelihood functions. First, the monster parametrization:

$$ \mathcal{L_M}\left(p_1,p_2,M\right)= \left( \frac{p_1+M;p_2}{p_1+p_2+M} \right)^{a;b}\cdot \left( \frac{p_1;p_2+M}{p_1+p_2+M} \right)^{c;d} $$

[here the first expression is for $p_1$ at home and $p_2$ away; the second is $p_1$ away and $p_2$ at home; $a$ is number of home wins for $p_1$, $b$ number of away wins for $p_2$, $c$ number of away wins for $p_1$ and $d$ number of home wins for $p_2$]. Secondly the monster parameterization:

$$ \mathcal{L_\lambda}\left(p_1,p_2,\lambda\right)= \left( \frac{\lambda p_1;p_2}{\lambda p_1+p_2} \right)^{a;b}\cdot \left( \frac{p_1;\lambda p_2}{p_1+\lambda p_2} \right)^{c;d} $$

A suitable generalized function would be

$$ \mathcal{L}\left(p_1,p_2,\lambda,M\right)= \left( \frac{\lambda p_1+M;p_2}{\lambda p_1+p_2+M} \right)^{a;b}\cdot \left( \frac{p_1;\lambda p_2+M}{p_1+\lambda p_2+M} \right)^{c;d} $$

Start simple: two teams

jj <- matrix(c(NA, 6+2i ,6+5i, NA),2,2,byrow=TRUE)
teams <- LETTERS[1:2]
dimnames(jj) <- list("@home" = teams,"@away"=teams)
dimnames(jj) <- list("@home" = teams,"@away"=teams)
print(jj)

Above we see that A played at home 3 times, winning 1 and losing 2 [row 1 column 2 is $1+2i$]; B played at home 9 times, winning 4 and losing 5 [row 2 column 1 is $4+5i$]. In its most general form, the $2\times 2$ result matrix would be

$$ \begin{bmatrix} -&\alpha+\beta i \ \gamma + \delta i&-\ \end{bmatrix} $$

and the log-likelihood function would be, algebraically,

$$ \mathcal{S}=\log\mathcal{L}= \log\left(\frac{A^\delta\cdot(A + M)^\alpha\cdot B^\beta (B + M)^\gamma }{ (A + B + M)^{\alpha+\beta+\gamma+\delta} }\right) = \delta\log A + \alpha\log(1-B) + \beta\log B + \gamma\log(1-A) $$

where $M$ is the strength of the home monster home and $A+B+M=1$, so we understand that $M=1-A-B$.

$$ \frac{\partial\mathcal{S}}{\partial A}=\frac{\delta}{A}-\frac{\gamma}{1-A}\qquad \frac{\partial\mathcal{S}}{\partial B}=\frac{\beta }{B}-\frac{\alpha}{1-A} $$

At a maximum, therefore,

$$ \hat{A}=\frac{\delta}{\delta + \gamma}\qquad \hat{B}=\frac{\beta }{\alpha + \beta} $$

Just to check, we evaluate the Hessian matrix and find

$$H=- \left(\begin{array}{cc} \frac{\partial^2\mathcal{S}}{\partial A\partial A} & \frac{\partial^2\mathcal{S}}{\partial A\partial B} \ \frac{\partial^2\mathcal{S}}{\partial B\partial A} & \frac{\partial^2\mathcal{S}}{\partial b\partial B} \end{array}\right) = \left(\begin{array}{cc} \delta A^{-2} + \gamma(1-A)^{-2}&0\ 0&\beta B^{-2} + \alpha(1-B)^{-2} \end{array}\right) $$

which is clearly strictly positive-definite if $00$, $\gamma+\delta>0$.

We will calculate numerical estimates using the monster-parameterization for a simple case:

M <- matrix(c(
    NA, 5+1i, 9+2i, NA),
    nrow=2,ncol=2,byrow=TRUE)
teams <- LETTERS[seq_len(2)]
dimnames(M) <- list("@home" = teams,"@away"=teams)
dimnames(M) <- list("@home" = teams,"@away"=teams)
M
(Hsimp <- home_away(M))
maxp(Hsimp)

Compare the theoretical value of

theoretical <- c(A=2/(2+9),B=1/(1+5))
fillup(theoretical,Hsimp)

showing good agreement. We note that the model predicts a probability of $\hat{A}/(\hat{A}+\hat{B})=\frac{12}{23}\simeq 0.52$ for $A$ winning in a match between $A$ and $B$ where the home team advantage is not operating (on neutral ground, for example).

Now the $\lambda$ parameterization. The likelihood function for the same dataset is

$$ \mathcal{S}=\log\mathcal{L}= \log\left( \frac{(\lambda A)^\alpha B^\beta}{(\lambda A+B)^{\alpha+\beta}} \cdot \frac{A^\gamma (\lambda B)^\delta}{(A+\lambda B)^{\gamma+\delta}} \right)$$

This is equal to $(\alpha+\delta)\log\lambda +(\alpha+\gamma)\log A+(\beta+\delta)\log B -(\alpha+\beta)\log(\lambda A+B) -(\gamma+\delta)\log(A+\lambda B)$ and even after simplifying [we may assume $A+B=1$] the result does not appear to have a simple analytical solution; we must proceed numerically. To do so,

we need to define a slight generalization of home_away3():

`ha3` <- function(home_games_won,away_games_won,lambda){  # based on home_away3()

    if(is.complex(home_games_won)){
        if(missing(lambda)){lambda <- away_games_won}
        away_games_won <- Im(home_games_won)
        home_games_won <- Re(home_games_won) 
    }

    teams <- rownames(home_games_won)
    stopifnot(identical(teams,colnames(home_games_won)))
    stopifnot(identical(teams,rownames(away_games_won)))
    stopifnot(identical(teams,colnames(away_games_won)))

    H <- hyper3(pnames=c(teams,"lambda","M"))

    for(i in seq_len(nrow(home_games_won))){
        for(j in seq_len(ncol(home_games_won))){
            if(i != j){  
                home_team <- teams[i]
                away_team <- teams[j]

        home_wins <- home_games_won[i,j]
        away_wins <- away_games_won[i,j] 
                no_of_matches <- home_wins + away_wins 

                ## home wins:
                jj <- c(lambda,1)
                names(jj) <- c(home_team,"M")
                H[jj] %<>% inc(home_wins)

                ## away wins:
                jj <- 1
                names(jj) <- away_team
                H[jj] %<>% inc(away_wins)

                ## denominator
                jj <- c(lambda,1,1)
                names(jj) <- c(home_team,away_team,"M")
                H[jj] %<>% dec(no_of_matches)
            } # if(i != j) closes
        } # j loop closes
    } # i loop closes
    return(H)
}

(maybe one day I will include function ha3() in the hyper2 package).

Now a more complicated example, a modified version of the 4x4 matrix on home_advantage.Rmd:

home_games_won <- matrix(c(
    NA, 16, 12, 11,
    19, NA, 19, 16,
    17, 12, NA, 11,
    11, 12, 12, NA),
    nrow=4,ncol=4,byrow=TRUE)

away_games_won <- matrix(c(
    NA, 05, 02, 02,
     9, NA, 10, 02,
     3, 04, NA, 07,
     8, 06, 04, NA),
    nrow=4,ncol=4,byrow=TRUE)

teams <- LETTERS[1:4]
dimnames(home_games_won) <- list("@home" = teams,"@away"=teams)
dimnames(away_games_won) <- list("@home" = teams,"@away"=teams)

D <- home_games_won + 1i*away_games_won
D

Above we see, from row 1 column 2 [36+10i], that A played at home against B a total of $36+10=46$ times, winning $36$ and losing $10$; From row 2 column 1 [11+10i] we see that B played at home against A a total of 21 times, winning 11 and losing 10. We can use the Monster likelihood function and test $H_0\colon M=0$:

H <- ha3(D,lambda=1.01)
print(H)

maxp(H)

s <- specificp.gt.test(H,"M",0,n=10)  # NB lambda = 1.88
s

We see strong evidence that $M>0$.

s$null_support

We want to plot a profile log-likelihood function, that is, a likelihood function corresponding to the evaluate constrained to having a particular value of $M$:

write(date(),file="marginallambdaprofilelike",append=FALSE)
f <- function(lambda){
 write(date(),file="marginallambdaprofilelike",append=TRUE)  # so we can check its progress
 return(maxp(ha3(D,lambda=lambda),n=1,give=TRUE)$value)
}

lam <- seq(from=0.1,to=4.2,len=5)
supp_M_eq_0 <- sapply(lam,f)

plot(lam,supp_M_eq_0,type="b",xlab="lambda")
plot(lam,supp_M_eq_0-max(supp_M_eq_0),type="b",xlab="lambda")
grid()
abline(h=c(0,-2))

Now enforce the constraint $\lambda=1$ [and allow $M$ to vary], thereby reducing the likelihood function from a hyper3 to a hyper2 object:

(H <- home_away(D))  # observe that the monster is 'home'
home <- seq(from=0.15,to=0.4,len=15)
supp_lambda_eq_0 <- profsupp(H,"home",home,relative=FALSE)
plot(home,supp_lambda_eq_0 - max(supp_lambda_eq_0),type='b')
abline(h=c(0,-2))

maxp(H)

plot(lam,supp_M_eq_0-max(supp_M_eq_0),type='b',main='Profile likelihood with M=0')
grid()
plot(home,supp_lambda_eq_0 - max(supp_lambda_eq_0),type='b')

both <- c(supp_lambda_eq_0,NA,supp_M_eq_0)
plot(both-max(both,na.rm=TRUE),type='b',col=c(rep("black",length(supp_lambda_eq_0)),"green",rep("red",length(supp_M_eq_0))))
abline(h=c(0,-2))

Support difference:

max(supp_M_eq_0)-max(supp_lambda_eq_0)

exceeding Edwards's two-units-of-support criterion by a comfortable margin. Note that both have equal numbers of degrees of freedom.

Likelihood as a function of $M,\lambda$

likef <- function(M,lambda,...){specificp.test(ha3(D,lambda=lambda),"M",M,...)$null_support}
like2 <- function(theta,...){likef(M=theta[1],lambda=theta[2],n=1,...)}

Use it:

Mval <- seq(from=0.01,to=0.65,len=15)
lval <- seq(from=0.2,to=5,by = 0.4) # we need lambda=1 (this is the null)
jj <- as.matrix(expand.grid(M=Mval,lam=lval))
options(use_alabama = FALSE) # setting to TRUE gives a "initial value not feasible" error, go figure
L <- apply(jj,1,like2)

options(digits=2)
options(width=150)
L <- matrix(L,length(Mval),length(lval))
L <- L-max(L)
rownames(L) <- paste("M = ",round(Mval,2))
colnames(L) <- paste("lam=",lval,sep="")
L
L <- pmax(L,-9)
L
Mval
lval
contour(Mval,lval,L,xlab='M',ylab='lambda')
abline(h=1,lwd=5)
abline(v=0,lwd=5)
filled.contour(Mval,lval,L,xlab='M',ylab='lambda')
matplot(Mval,L,type='b')
matplot(lval,t(L),type='b')

Create a new dataset by direct simulation

set.seed(0)
P <- matrix(0,4,4)
teams <- LETTERS[1:4]
dimnames(P) <- list("@home" = teams,"@away"=teams)
n <- 90 # number of matches between a pair 
M <- 0.15  # home monster
strengths <- (4:1)/10
strengths <- strengths/sum(strengths)
names(strengths) <- teams
strengths
for(home_team in 1:4){
  for(away_team in 1:4){
    if(home_team != away_team){
      prob_home_win <- (strengths[home_team] + M)/(strengths[home_team] + strengths[away_team] + M)
      home_wins <- rbinom(1,n,prob_home_win)
      home_losses <- n - home_wins
      P[home_team,away_team] <- home_wins + 1i*home_losses
    }
  } 
}
diag(P) <- NA
P
HP <- home_away(P)

HP

mH <- maxp(HP,n=3,give=T)
mH

home <- seq(from=0.12,to=0.16,len=15)
supp_lambda_eq_1 <- profsupp(HP,"home",home,relative=FALSE)

jjH <- ha3(P,lambda=1) # should match HP above
jjH
as.hyper3(HP) - jjH

plot(home,supp_lambda_eq_1,type='b')
plot(home,supp_lambda_eq_1 - max(supp_lambda_eq_1),type='b')
abline(h=c(0,-2))

specificp.gt.test(ha3(P,lambda=1.4),"M",0)
specificp.gt.test(ha3(P,lambda=1.4),"M",1e-4)

f <- function(lambda){
   o <- ha3(P,lambda=lambda)
   return(specificp.gt.test(o,"M",0)$null_support)
}

lam <- seq(from=1.9,to=2.4,len=5)
supp_M_eq_0 <- sapply(lam,f)

plot(lam,supp_M_eq_0,type='b')
plot(lam,supp_M_eq_0-max(supp_M_eq_0),type='b')
abline(h=c(0,-2))

jj <- c(supp_M_eq_0,NA,supp_lambda_eq_1)
plot(jj-max(jj,na.rm=TRUE),type="b")
abline(h=c(0,-2))

References {-}

RobinHankin/hyper2 documentation built on April 21, 2024, 11:38 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RobinHankin/hyper2
The Hyperdirichlet Distribution, Mark 2

In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

Start simple: two teams

Likelihood as a function of $M,\lambda$

Create a new dataset by direct simulation

References {-}

R Package Documentation

Browse R Packages

We want your feedback!

RobinHankin/hyper2 The Hyperdirichlet Distribution, Mark 2

In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

Start simple: two teams

Likelihood as a function of $M,\lambda$

Create a new dataset by direct simulation

References {-}

R Package Documentation

Browse R Packages

We want your feedback!

RobinHankin/hyper2
The Hyperdirichlet Distribution, Mark 2