library(knitr) options(width = 120) opts_chunk$set(fig.width = 12, fig.height = 8, fig.path = 'Figs/', echo=TRUE,results="markup",eval=TRUE, include = TRUE, warning = FALSE, message = FALSE) select_max <- function(x){ t.f = x>=max(x) if (length(x[t.f])>1){ y = rep(0,length=length(x)) seq.1 = c(1:length(x)) index = sample(seq.1[t.f],1) y[index] = 1 } else { y = ifelse(x>=max(x),1,0) } return(y) } prob_to_binary = function(pred){ pred.type = t(apply(pred,1,select_max)) # transform maximum prob into 1 and others in 0's return(pred=pred.type) }
First install the packages. Here we use the library gtools
to generate some data to simulate data for testing purposes.
```{R,eval = TRUE}
devtools::install_github("bobdouma/kappa_multinomial") library(multinomialperform)
library(gtools)
In the simplest case there is a matrix (or data frame) of predictions and a data frame of observations. Both of these can either be binary or continuous probablities. But all cells of both matrixes must be $0<=y_{i}<=1$. The columns of both matrices correspond to $q$ land cover classes. Each row is a sample $i$. Further for the modelled class probabilities and the class observations it should hold that $\sum_{k=1}^q y_{ik} =1$ and $\sum_{k=1}^m p_{ik} = 1$.Not that one of the advances of this approach is that uncertainty in the observations can be represented. The order of the rows for the land cover classes in the observed and predicted matrices should correspond exactly. $\kappa_{multinomial}$ can be calculated as the product of $\kappa_{loc} * \kappa_{prob} = \frac{p_{0 multinomial} - p_{e multinomial}}{p_{max}-p_{e multinomial}} * \frac{p_{max} - p_{e multinomial}}{1-p_{e multinomial}}$ $\kappa_{prob}$ measures the degree to which ranks of the predicted class probabilities correspond to the ranks of the observed class frequencies. It thus reaches one if there is a perfect match between the rank orders of the observations and predictions. It reaches zero if the model has similar performance compared to the null model. $\kappa_{loc}$ in turn, measures the certainty of the model in the case of discrete observations. For continuous observations, $\kappa_{loc}$ measures the mean match of the sorted observed and predicted sample frequencies. $\kappa_{loc}$ equals zero if the performance of the multinomial equals the null model. It equals one if, for each sample, the sorted predictions exactly matches the sorted observations. ### Case 1: Predictions are continuous probabilities and observations discrete In this case we show an example where the observed classes are predicted to be most likely. This implies that $\kappa_{prob}$ should equal to one and $\kappa_{loc}$ will depend on the average certainty with which the observed classes are predited ```{R,eval = TRUE} pred = as.data.frame(rdirichlet(100, c(0.1,0.1,0.5,0.5))) # generate multinomial probabilties with four classes pred = t(apply(pred,1,sample)) # randomly shuffles the columns for each sample; otherwise only one class is most likely obs = as.data.frame(prob_to_binary(pred)) # prob_to_binary transforms probabilities to discrete outcomes, 1 for the most probable class, 0's for the remaining classes kappa_multinomial(obs=obs,pred=pred) # calculate kappa
In this case we show an example where the observed classes are predicted to be most likely. However, the certainty with which the classes are predicted is much higher. Again, $\kappa_{prob}$ should equal to one and $\kappa_{loc}$ will depend on the average certainty with which the observed classes are predited
```{R,eval = TRUE} pred = as.data.frame(rdirichlet(100, c(0.1,0.1,4.5,0.5))) # generate multinomial probabilties with four classes pred = t(apply(pred,1,sample)) obs = as.data.frame(prob_to_binary(pred)) kappa_multinomial(obs=obs,pred=pred) # calculate kappa
### Case 3: Predictions are continuous probabilities and observations discrete In this case we show an example where there is mismatch between the observed classes and the classes predicted to be most likely. $\kappa_{loc}$ remains similar as the previous example. However $\kappa_{prob}$ becomes lower than one. ```{R,eval = TRUE} obs = as.data.frame(prob_to_binary(pred)) # transform probs to 0,1 (1 being the most likely) resample = sample(c(1:100),20) # randomly pick 20 observations obs[resample,] = obs[sample(resample),] # randomly shuffle 20 observations kappa_multinomial(obs=obs,pred=pred) # calculate kappa
In this case we show an example wehere prediction frequency is equal to observed frequency. In this case \kappa_{multinomial} should equal 1, indicating perfect model performance.
```{R,eval = TRUE} pred = as.data.frame(rdirichlet(100, c(0.1,0.1,0.5,0.5))) # generate multinomial probabilties with four classes obs = pred kappa_multinomial(obs=obs,pred=pred) # calculate kappa
### Case 5: Observations not discrete and prediction frequency random relative to observed In this case the evaluation of the model should find that $\kappa_{prob}$ preforms poorly, while the maximum model fit of a model with this set of modelled class distributions for the set of observations, $\kappa_{loc}$ is one. ```{R,eval = TRUE} pred = as.data.frame(rdirichlet(100, c(0.1,0.1,0.5,0.5))) # generate multinomial probabilties with four classes obs = as.data.frame(t(apply(pred,1,sample))) # randomly shuffle observations kappa_multinomial(obs=obs,pred=pred,nsim=10000) # calculate kappa
Observations not discrete: The observations are generated such that the class predicted with higest probability is observed to be a bit less likely (val). The ranking of the sample probabilities remain the same; hence $\kappa_{prob}$ is close to one (not exactly because if the class that is predicted with highest probablity is smaller than the second highest class after subtracting val than the ranking is changed). In contrast $\kappa_{loc}$ decreases.
```{R,eval = TRUE} increase = function(x,val=0.05){ for (i in 1:nrow(x)){ x[i,which.max(x[i,])] = x[i,which.max(x[i,])] - val x[i,-which.max(x[i,])] = x[i,-which.max(x[i,])]+ val/(ncol(x)-1) } return(x) }
pred = as.data.frame(rdirichlet(100, c(0.1,0.1,0.5,0.5))) # generate multinomial probabilties with four classes obs = pred # randomly shuffle observations
pred.lower= increase(pred,val=0.1)
kappa_multinomial(obs=obs,pred=pred) # calculate kappa kappa_multinomial(obs=obs,pred=pred.lower) # calculate kappa ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.