rm(list = ls()) library(eliciteg) data(moveData)
In the multipleExpertProblem
vignette, I raised the issue of how I was struggling to get good c
values returned from the calcC
function, which in return was causing calcK
to return NA
values. I showed these to Michail, and he corrected my indexing of the data/prior combo that was going into these functions. Specifically, I was sending the whole 9 x 9
matrix, when I really wanted to be sending a 1 x 9
vector for both of the data and the prior.
I tried first tackling this with mapply()
, but didn't get very far - as it wasn't returning what I wanted. (Most likely due to my admittedly poor comprehension of the apply
family of functions.) What I want is a matrix that holds all c
values. The dimensions would be nrow
= # experts, and ncol = nreg
. So for one element of that matrix, I'd see something like this:
outMat[1, 1] <- calcC(dat[, 1], priorList$males$exp1[, 1]) outMat[2, 1] <- calcC(dat[, 1], priorList$males$exp2[, 1]) outMat[3, 1] <- calcC(dat[, 1], priorList$males$exp3[, 1])
And so on. What that would mean is that the first column of the matrix (outMat[, 1]
) would contain all the c
values for each expert/data combination for the first region. The second column of the matrix (outMat[, 2]
) would contain all the c
values for each expert/data combination for the second region, etc.
Here's my first attempt and calculating that for one whole expert/region combination. I've pared down the data & prior dimension to make it a bit easier to see and puzzle through. First the (pared) data, and prior (from expert 4):
maleExT <- maleEx[5:9, 5:9] priorT <- priorList$males$exp4[5:9, 5:9] maleExT priorT
And then we set up the matrix to hold the data, conduct the calculation, and print the output:
cmatOut <- matrix(NA, nrow = 1, ncol = dim(maleExT)[2]) for(i in seq_along(1:dim(maleExT)[2])){ cmatOut[, i] <- calcC(maleExT[, i], priorT[, i]) } cmatOut
That is one row matrix that has the c
weight for this one individual expert over all five regions. However I need to do this for all experts and regions (actually I'm still only using 5 regions here). While my solution that uses a loop is a kludgy way to do this, I want to get it right first.
Here's the initialisation of the output array:
cmatOut <- matrix(NA, nrow = length(priorList$males), ncol = dim(maleExT)[2], dimnames = list(c('exp1', 'exp2', 'exp3', 'exp4', 'exp5', 'exp6', 'exp7', 'exp8'), colnames(maleExT)))
Note that the rows correspond to the number of experts, while the columns correspond to the number of regions. With that set up, we can populate it with a nested set of loops:
for(i in seq_along(1:dim(maleExT)[2])){ for(j in seq_along(1:length(priorList$males))){ priorT <- priorList$males[[j]][5:9, 5:9] cmatOut[j, i] <- calcCNew(maleExT[, i], priorT[, i]) } } cmatOut
calcK
Next up is to send the results from cmatOut
on to calcK
for processing. I'll still do this for the subset of regions just to make sure things are working. Once done, I can expand outward. The way I did this previously was to loop over a list with lapply
, which I'll replace with apply
since I no longer have an input list. And while I can still use the function the output will no longer be a list, so I'll change the name of the output object. The processing logic, then, is to take a column's worth of c-vales, and from them, calculate the K
values:
kmat <- apply(cmatOut, 1, function(x) calcK(x)) kmat colSums(kmat)
That output, then, is a matrix where each column is the probability vector (that sums to 1), which we will use to select the expert within the Gibbs loop.
Let's start with males. To do this, I have to first pare down the list to match the number of regions in the example above.
priorListMalesT <- lapply(priorList$males, function(x) x[5:9, 5:9])
With the dimensions now reduced, we can move on to the sampling. Here I'll simulate a Gibbs loop for males.
ng <- 100 nregions <- ncol(kmat) post <- matrix(0, nregions, nregions) summh <- post for(i in seq_along(1:ng)){ for(j in seq_along(1:nregions)){ idx <- which(rmultinom(1, 1, kmat[, j]) == 1) # chooses the expert prior <- priorListMalesT[[idx]][, j] # gets prior for that expert di <- matrix(rgamma(nregions * 1, shape = maleExT[, j] + prior, scale = 1), nregions, 1) post[, j] <- di / matrix(colSums(di), nregions, 1, byrow = T) } summh <- summh + post } malemovePostprob <- summh / ng round(malemovePostprob, 3)
That looks good as a quick first cut at it. Next up is to do it for all 9 regions.
I'd pared the size of the issue to try and make sense of these algorithm, which makes sense to me at first glance. Here I'll work with the full size, and work it out for both males and females. For the Gibbs sampler, I need three inputs:
cmatOut
kmat
data
(n.b. In this vignette I don't really need to do this in a Gibbs loop, because I'm not imputing the data, however in the actual code we will be imputing the data. So I'll preserve the sampling structure here.)
This will be the actual c
matrix. Inputs here are:
maleEx
which is a 9 x 9 matrix of one actual slice of data.priorList$males
which is a list with 8 elements, one for each expert, and each element containing the prior for that expert.cmatOut <- matrix(NA, nrow = length(priorList$males), ncol = dim(maleEx)[2], dimnames = list(c('exp1', 'exp2', 'exp3', 'exp4', 'exp5', 'exp6', 'exp7', 'exp8'), colnames(maleEx)))
for(i in seq_along(1:dim(maleEx)[2])){ # regions for(j in seq_along(1:length(priorList$males))){ #experts priorT <- priorList$males[[j]] #jth expert cmatOut[j, i] <- calcCNew(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior } }
At first blush above with the small example this worked, but again, when we have some large data values, things appear to be a bit off. Specifically, for GOM, we end up with all 0 values again for the cmatOut
, which will be problematic when we send those to calcK
:
cmatOut[, 2]
And in that column in the data matrix, we have some big observed values - in particular the GOM to GOM transition:
maleEx[, 2]
What is weird, though, is that in JL, we also have some big values for the observed data:
maleEx[, 4]
And yet, we are able to get finite, albeit very small values:
cmatOut[, 4]
I played around with changing a few numbers in the data (see commented lines in the code block that calculate cmatOut
above), and these will yield finite values for the GOM transition, and all 0s for the JL transition. For example:
data(moveData) # I'll show the raw data # then change the data # then the updated data: maleEx maleEx[2, 2] <- 10 maleEx for(i in seq_along(1:dim(maleEx)[2])){ # regions for(j in seq_along(1:length(priorList$males))){ #experts priorT <- priorList$males[[j]] #jth expert cmatOut[j, i] <- calcC(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior } } cmatOut[, 2]
And then we'll change the values in the JL data column:
data(moveData) # I'll show the raw data # then change the data # then the updated data: maleEx maleEx[6, 4] <- 150 maleEx for(i in seq_along(1:dim(maleEx)[2])){ # regions for(j in seq_along(1:length(priorList$males))){ #experts priorT <- priorList$males[[j]] #jth expert cmatOut[j, i] <- calcC(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior } } cmatOut[, 4]
So I'm not sure what gives here, and could use your thoughts on this.
For reference, here are the two functions I'm using:
calcC calcK
After discussion of the above issue with M Papathomas, he noted that these problems are basically a numerical issue. Michail proposed a solution that involves adding a scalar to very large negative values in the $C$ matrix, such that we can still preserve the relationships among the experts, but not get NA values when we calculate $K$.
What I've done, then, is to break up the calcC
function into two different functions, so that the first one simply calculates the log values:
calcC
Then the second one checks for the very large negative values, and if present, transforms the column:
transformC
The logic of this last one is to create a logical vector tvec
if all values in a column are less than the tval
threshold. If so, then, we calculate the max value for each of those columns with apply()
. Then I use sweep
to add these back into the matrix. Finally after that processing, we return the exponentiated matrix.
Let's try it with the real data
cmatOut <- matrix(NA, nrow = length(priorList$males), ncol = dim(maleEx)[2], dimnames = list(c('exp1', 'exp2', 'exp3', 'exp4', 'exp5', 'exp6', 'exp7', 'exp8'), colnames(maleEx)))
for(i in seq_along(1:dim(maleEx)[2])){ # regions for(j in seq_along(1:length(priorList$males))){ #experts priorT <- priorList$males[[j]] #jth expert cmatOut[j, i] <- calcC(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior } } cmatOut
And then we transform it:
transformC(cmatOut) cmatOut <- transformC(cmatOut)
Which should leave us with something we can work with when we calculate $K$:
kmat <- apply(cmatOut, 1, function(x) calcK(x)) kmat colSums(kmat)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.