Hi Michail - I've made good progress on folding in the multiple experts into the Gibbs sampler that we've written as part of the right whale analysis. I got it to work with no obvious problems when I used simulated data but with actual priors. However when I got to the point of using actual data and actual priors, I am running into issues. These are primarily related to the fact that with actual data, I'm getting 0 values returned for the c
coefficient that is used to assemble the K
probability vector.
Here's my pseudo code for the loop. Recall that data are being imputed at each iteration - hence the need for sampling.
c
value for each of 8 expertsK
probability vectorrmultinom()
using K
to get $k_i$In R
code, that looks like this chunk below. (Note that somewhat confusingly, we're indexing over k
for the months. So below, idx
is the same as $k_i$.)
for(k in 1:12){ malePrior <- moveMale[, , k] femalePrior <- moveFem[, , k] # sampling for the experts' prior if the month == December if(k == 12){ # create C - combined from data and priors cvecM <- lapply(priorList$males, function(x) calcC(maleMove[, , k], x)) cvecF <- lapply(priorList$females, function(x) calcC(femMove[, , k], x)) # create K - combined from C klistM <- lapply(cvecM, function(x) calcK(x)) klistF <- lapply(cvecF, function(x) calcK(x)) # Then sample to choose the prior: idx <- which(rmultinom(1, 1, klistM) == 1) malePrior <- matrix(data = unlist(priorList[['males']][idx]), nrow = length(regID), ncol = length(regID)) idx <- which(rmultinom(1, 1, klistF) == 1) femalePrior <- matrix(data = unlist(priorList[['females']][idx]), nrow = length(regID), ncol = length(regID)) } di <- matrix(rgamma(nreg * nreg, shape = maleMove[, , k] + malePrior, scale = 1), nreg, nreg) maleMoveProb[, , k] <- di / matrix(colSums(di), nreg, nreg, byrow = T) di <- matrix(rgamma(nreg * nreg, shape = femMove[, , k] + femalePrior, scale = 1), nreg, nreg) femMoveProb[, , k] <- di / matrix(colSums(di), nreg, nreg, byrow = T) }
Recall that for the first run, we'll have flat priors for all months except December. Once we get this working, we can compare these with Philip's existing priors. But to start with, we'll use the flat ones. So let's look at some values and functions to explain this a bit more.
The way the algorithm works in the code right now is that we have data (maleMove
) being updated at each turn, and we have a prior, e.g. moveMale
. Both of these are 9 by 9 by 12 arrays, corresponding to 9 regions, and 12 monthly transitions. So you can see the loop is over 1:12.
In the runs to date, those just using Philip's priors, the prior never changes. However in this run, we'll need to choose an experts prior based on the machinery that we've worked out previously. For January through November, this static prior will be set to malePrior
or femalePrior
.
I then have an if()
statement to incorporate the machinery to bring in the new prior. The logic is, if it's December, then loop over the priors to calculate c
and k
. In particular, the priors are stored in a recursive list called priorList
, and I use lapply()
to calculate c
with the current month's data:
cvecM <- lapply(priorList$males, function(x) calcC(maleEx, x))
UPDATE However, after discussion with Michail on 16 August 2016, I need to change this, so only evaluate 1 region at a time. This means that the input to calcC
will be slightly different. Now I think I want something along these lines:
cvecM <- apply(maleEx, 2, function(x) calcC(x, priorList$females$exp1[,1]))
The function I use is calcC()
:
library(eliciteg) calcC <- function(data, prior){ dd <- as.vector(data) pp <- as.vector(prior) if(length(dd) != length(pp)) stop('Data and prior lengths do not match') logc <- lgamma(sum(pp)) - sum(lgamma(pp)) - (lgamma(sum(pp + data)) - sum(lgamma(pp + data))) return(exp(logc)) }
With that assembled, I then call lapply()
again to assemble the K
vector of probabilities:
klistM <- lapply(cvecM, function(x) calcK(x))
where, calcK
is:
calcK <- function(cdat){ numexp <- length(unlist(cdat)) up <- (1 / numexp) * unlist(cdat) down <- (1 / numexp) * sum(unlist(cdat)) up / down }
That makes sense to me, and works with simulated data. For example, here I have a matrix of all 1's:
data <- matrix(1, nrow = 9, ncol = 9) data
And using two real priors:
prior1 <- priorList$males$exp7 prior2 <- priorList$males$exp4 prior1
...we calculate c
:
c1 <- calcC(data, prior1) c2 <- calcC(data, prior2)
And then K
:
kvec <- calcK(cdat = c(c1, c2)) kvec
Ok, Expert 7 looks to swamp the answers, but we are getting reasonable results.
That gave me confidence, but when I started to implement this with real data, it fell apart straight away. Here's a slice of real data from one run:
data <- maleEx
data
When I try to derive c
, I get 0's, which will then return NaN from calcK
:
c1 <- calcC(data, prior1) c2 <- calcC(data, prior2)
And then K
:
kvec <- calcK(cdat = c(c1, c2)) kvec
What appears to be happening is that the logc
values are << 0, so when we call exp(logc)
we get 0. I tried working with gamma, instead of lgamma, but got similar results.
That means I'm now stuck, and want/need to touch base with you to get unstuck.
ok, after discussions with Michail on the 16th, it appears that while my functions are correct, my data input is not. I was sending all of the Dirichlet's to the calcC
when in fact I think I just want to send each in turn, i.e. I want to loop over columns. So that way I will have the movement probabilities be conditioned upon current location.
What this looks like in pseudocode is:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.