rm(list = ls())
library(eliciteg)
data(moveData)

In the multipleExpertProblem vignette, I raised the issue of how I was struggling to get good c values returned from the calcC function, which in return was causing calcK to return NA values. I showed these to Michail, and he corrected my indexing of the data/prior combo that was going into these functions. Specifically, I was sending the whole 9 x 9 matrix, when I really wanted to be sending a 1 x 9 vector for both of the data and the prior.

I tried first tackling this with mapply(), but didn't get very far - as it wasn't returning what I wanted. (Most likely due to my admittedly poor comprehension of the apply family of functions.) What I want is a matrix that holds all c values. The dimensions would be nrow = # experts, and ncol = nreg. So for one element of that matrix, I'd see something like this:

outMat[1, 1] <- calcC(dat[, 1], priorList$males$exp1[, 1])
outMat[2, 1] <- calcC(dat[, 1], priorList$males$exp2[, 1])
outMat[3, 1] <- calcC(dat[, 1], priorList$males$exp3[, 1])

And so on. What that would mean is that the first column of the matrix (outMat[, 1]) would contain all the c values for each expert/data combination for the first region. The second column of the matrix (outMat[, 2]) would contain all the c values for each expert/data combination for the second region, etc.

Here's my first attempt and calculating that for one whole expert/region combination. I've pared down the data & prior dimension to make it a bit easier to see and puzzle through. First the (pared) data, and prior (from expert 4):

maleExT <- maleEx[5:9, 5:9]
priorT <- priorList$males$exp4[5:9, 5:9]
maleExT
priorT

And then we set up the matrix to hold the data, conduct the calculation, and print the output:

cmatOut <- matrix(NA, nrow = 1, ncol = dim(maleExT)[2])

for(i in seq_along(1:dim(maleExT)[2])){
 cmatOut[, i] <- calcC(maleExT[, i], priorT[, i])
}
cmatOut

That is one row matrix that has the c weight for this one individual expert over all five regions. However I need to do this for all experts and regions (actually I'm still only using 5 regions here). While my solution that uses a loop is a kludgy way to do this, I want to get it right first.

Here's the initialisation of the output array:

cmatOut <- matrix(NA, nrow = length(priorList$males), 
                  ncol = dim(maleExT)[2],
                  dimnames = list(c('exp1', 'exp2', 'exp3', 'exp4', 
                                           'exp5', 'exp6', 'exp7', 'exp8'),
                                  colnames(maleExT)))

Note that the rows correspond to the number of experts, while the columns correspond to the number of regions. With that set up, we can populate it with a nested set of loops:

for(i in seq_along(1:dim(maleExT)[2])){
  for(j in seq_along(1:length(priorList$males))){
    priorT <- priorList$males[[j]][5:9, 5:9]
    cmatOut[j, i] <- calcCNew(maleExT[, i], priorT[, i])
  }
}
cmatOut

calcK

Next up is to send the results from cmatOut on to calcK for processing. I'll still do this for the subset of regions just to make sure things are working. Once done, I can expand outward. The way I did this previously was to loop over a list with lapply, which I'll replace with apply since I no longer have an input list. And while I can still use the function the output will no longer be a list, so I'll change the name of the output object. The processing logic, then, is to take a column's worth of c-vales, and from them, calculate the K values:

kmat <- apply(cmatOut, 1, function(x) calcK(x))
kmat
colSums(kmat)

That output, then, is a matrix where each column is the probability vector (that sums to 1), which we will use to select the expert within the Gibbs loop.

Sample Individual Experts

Let's start with males. To do this, I have to first pare down the list to match the number of regions in the example above.

priorListMalesT <- lapply(priorList$males, function(x) x[5:9, 5:9])

With the dimensions now reduced, we can move on to the sampling. Here I'll simulate a Gibbs loop for males.

ng <- 100
nregions <- ncol(kmat)
post <- matrix(0, nregions, nregions)  
summh <- post 
for(i in seq_along(1:ng)){

    for(j in seq_along(1:nregions)){

      idx <- which(rmultinom(1, 1, kmat[, j]) == 1) # chooses the expert
      prior <- priorListMalesT[[idx]][, j] # gets prior for that expert

      di <- matrix(rgamma(nregions * 1, 
                    shape = maleExT[, j] + prior, 
                    scale = 1), nregions, 1) 

      post[, j] <- di / matrix(colSums(di), nregions, 1, byrow = T) 

    } 
  summh <- summh + post
}

malemovePostprob <- summh / ng
round(malemovePostprob, 3)

That looks good as a quick first cut at it. Next up is to do it for all 9 regions.

All 9 Regions for Both Sexes

I'd pared the size of the issue to try and make sense of these algorithm, which makes sense to me at first glance. Here I'll work with the full size, and work it out for both males and females. For the Gibbs sampler, I need three inputs:

  1. cmatOut
  2. kmat
  3. data

(n.b. In this vignette I don't really need to do this in a Gibbs loop, because I'm not imputing the data, however in the actual code we will be imputing the data. So I'll preserve the sampling structure here.)

Create C Matrix

This will be the actual c matrix. Inputs here are:

  1. maleEx which is a 9 x 9 matrix of one actual slice of data.
  2. priorList$males which is a list with 8 elements, one for each expert, and each element containing the prior for that expert.
cmatOut <- matrix(NA, nrow = length(priorList$males), 
                  ncol = dim(maleEx)[2],
                  dimnames = list(c('exp1', 'exp2', 'exp3', 'exp4', 
                                    'exp5', 'exp6', 'exp7', 'exp8'),
                                  colnames(maleEx)))
for(i in seq_along(1:dim(maleEx)[2])){ # regions
  for(j in seq_along(1:length(priorList$males))){ #experts
    priorT <- priorList$males[[j]] #jth expert
    cmatOut[j, i] <- calcCNew(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior
  }
}

Issues Arising - Again

At first blush above with the small example this worked, but again, when we have some large data values, things appear to be a bit off. Specifically, for GOM, we end up with all 0 values again for the cmatOut, which will be problematic when we send those to calcK:

cmatOut[, 2]

And in that column in the data matrix, we have some big observed values - in particular the GOM to GOM transition:

maleEx[, 2]

What is weird, though, is that in JL, we also have some big values for the observed data:

maleEx[, 4]

And yet, we are able to get finite, albeit very small values:

cmatOut[, 4]

I played around with changing a few numbers in the data (see commented lines in the code block that calculate cmatOut above), and these will yield finite values for the GOM transition, and all 0s for the JL transition. For example:

data(moveData)
# I'll show the raw data
# then change the data
# then the updated data:
maleEx
maleEx[2, 2] <- 10
maleEx

for(i in seq_along(1:dim(maleEx)[2])){ # regions
  for(j in seq_along(1:length(priorList$males))){ #experts
    priorT <- priorList$males[[j]] #jth expert
    cmatOut[j, i] <- calcC(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior
  }
}
cmatOut[, 2]

And then we'll change the values in the JL data column:

data(moveData)
# I'll show the raw data
# then change the data
# then the updated data:
maleEx
maleEx[6, 4] <- 150
maleEx

for(i in seq_along(1:dim(maleEx)[2])){ # regions
  for(j in seq_along(1:length(priorList$males))){ #experts
    priorT <- priorList$males[[j]] #jth expert
    cmatOut[j, i] <- calcC(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior
  }
}
cmatOut[, 4]

So I'm not sure what gives here, and could use your thoughts on this.

For reference, here are the two functions I'm using:

calcC

calcK

Update with New Code

After discussion of the above issue with M Papathomas, he noted that these problems are basically a numerical issue. Michail proposed a solution that involves adding a scalar to very large negative values in the $C$ matrix, such that we can still preserve the relationships among the experts, but not get NA values when we calculate $K$.

What I've done, then, is to break up the calcC function into two different functions, so that the first one simply calculates the log values:

calcC

Then the second one checks for the very large negative values, and if present, transforms the column:

transformC

The logic of this last one is to create a logical vector tvec if all values in a column are less than the tval threshold. If so, then, we calculate the max value for each of those columns with apply(). Then I use sweep to add these back into the matrix. Finally after that processing, we return the exponentiated matrix.

Let's try it with the real data

cmatOut <- matrix(NA, nrow = length(priorList$males), 
                  ncol = dim(maleEx)[2],
                  dimnames = list(c('exp1', 'exp2', 'exp3', 'exp4', 
                                    'exp5', 'exp6', 'exp7', 'exp8'),
                                  colnames(maleEx)))
for(i in seq_along(1:dim(maleEx)[2])){ # regions
  for(j in seq_along(1:length(priorList$males))){ #experts
    priorT <- priorList$males[[j]] #jth expert
    cmatOut[j, i] <- calcC(maleEx[, i], priorT[, i]) #ith region for data; ith region of jth expert for prior
  }
}
cmatOut

And then we transform it:

transformC(cmatOut)
cmatOut <- transformC(cmatOut)

Which should leave us with something we can work with when we calculate $K$:

kmat <- apply(cmatOut, 1, function(x) calcK(x))
kmat
colSums(kmat)


robschick/eliciteg documentation built on May 27, 2019, 11:58 a.m.