library(SatTagSim)
library(knitr)

username =  as.character(Sys.info()[7])
setwd(paste0('C:/Users/', username,'/Google Drive/PHD/COLLABORATIONS/SCRS_SIMS/'))
datadir = 'D:/SKYDRIVE_BACKUP/PHD_SIMS/SCRS_SIMS'

mycol = colorRampPalette(c("lightcyan", "royalblue", "blue", "lemonchiffon", "orange", "red"), space = "Lab")
load('sim_comparison_models.Rdata')

Variables used in this example (three datasets/runs)

eatl.ct = get.trans.prob(eatl.datbox, perc = F)
watl.big.ct = get.trans.prob(watl.big.datbox, perc = F)
watl.small.ct = get.trans.prob(watl.small.datbox, perc = F)
eatl = get.trans.prob(eatl.datbox, perc = T)
watl.big = get.trans.prob(watl.big.datbox, perc = T)
watl.small = get.trans.prob(watl.small.datbox, perc = T)

functions for multinomial variance.

The expected number of times the outcome $i$ observed over $n$ trials is \begin{equation} {\displaystyle \operatorname {E} (X_{i})=np_{i}\,} \end{equation}

\begin{equation} \operatorname{v}(X_i) = np_i(1-p_i) \end{equation}

where $n$ is the number of trials and $p$ is the probability. Here, I consider each row of a movement matrix a multinomial. This is what we discussed in the proposal defense. code modified from \color{blue}here

\color{red}Question:

I'm not sure how to represent $n$, but I assumed the transition between seasons was a single trial for each row.

# function to calculate variance of a multinomial row
get.mvar = function(x){
  p = drop(x) # use the percentages
  v = p*(1-p)
  # c = outer(p, 1-p)
  # vcv = diag(c)
  v
}

# function to sample from a posterior multinomial distribution
get.mvar.samp = function(vmu, vsd) {
  o = rtruncnorm(1, mean = vmu, sd = vsd, a = 0, 1)
  o[is.nan(o)] = 0
  o
}

# example for seasonal transition matrices
vmat = lapply(watl.big, function(x) apply(x, 1, get.mvar))
kable(vmat[[1]], row.names = 1:7, col.names = 1:7, caption = "Multinomial variance for Western bluefin >185cm, in Winter", digits = 4)

\newpage

Plot the variance for each cell, each season (colorbar)

Overlay with original movement matrix values

Example using Western bluefin >185cm

ex7 = expand.grid(1:7, 1:7)

par(mfrow=c(2,2))
sapply(1:4, function(x) {
  image.plot(1:7, 1:7, vmat[[x]], col = mycol(49), xlab = '', ylab = '')
  text(ex7[,1], ex7[,2], round(as.vector(watl.big[[x]]),3), font = 2)
  title(names(watl.big)[x])
  }
  )

\newpage

sample from posterior

samp = data.frame(sapply(1:7, function(x) get.mvar.samp(watl.big[[1]][x,], vmat[[1]][x,])))
names(samp) = 1:7
kable(samp, row.names = 1:7, caption = 'Posterior sample: Western Atlantic bluefin > 185cm, Winter season', digits = 4)

\newpage

Goodness of Fit

function: compare pairs of rows of two matrices, using loglinear model

returns the pchisq value

$\chi^{2}$ multinomial goodness of fit test from \color{blue}here

\begin{equation} {\displaystyle \operatorname {\chi^{2}} = \sum_{i} \frac{(f_{i}-e_{i})^2}{e_{i}}\,} \end{equation}

# first arguement is counts, second argument is probability. In this way, row-wise comparisons of two matrices may be made
# p-values of the log-linear (chisq test) are returned

get.loglin <- function(mat1, mat2){
  fm = NULL
  for(i in 1:nrow(mat1)){
    fm[[i]] =  loglin(rbind(mat1[i,], mat2[i,]+1e-30),c(1), print = F, fit = F)
  }
  unlist(lapply(fm, function(x) 1-pchisq(x$lrt, x$df)))
}

get.mgof <- function(mat1, mat2){
  fm = NULL
  for(i in 1:nrow(mat1)){
    fm[i] =  chisq.test(mat1[i,], p = mat2[i,]+1e-30)$p.value
  }
 fm
}

I tried several versions of log-linear and $\chi^2$ tests. Log linear models, and their GLM versions, are essentially wrappers on $\chi^2$ tests for ease of use and flexibility in using contingency tables in various formats.

Since each row of the movments matrices is dependent on it's associated region i nthe previous time step, the matrix as a whole cannot be considered a contingency table. Each row, however, may be considered it's own 1-dimensional contingency table.

To compare any two rows, within a matrix or between matrices, the best method appeared to be the $\chi^2$ test. To set this up, counts within a row are compared to the probablity in another row. The probability from the second matirx row is simply $\frac{x_{i}}{\sum x_{i} }$, and is the markovian result we have used to date.

To do goodness of fit comparisons, I used the get.mgof function. If any probabilities are $0$, the $\chi^2$ has issues. I added $1e-30$ to alleviate this problem. Perhaps there is a better way?

\newpage

Compare between seasons of same run/stock (e.g. Western Atlantic > 185cm)

bw.bs = get.mgof(watl.big.ct[[1]], watl.big[[2]])
print(bw.bs)

Compare between runs, same season

# same row, same season, large and small WABFT
# (bg.sm = get.loglin(watl.big.ct[[1]], watl.small[[1]]))

bg.sm = data.frame(sapply(1:4, function(x) get.mgof(watl.big.ct[[x]], watl.small[[x]])))
names(bg.sm) =  names(watl.big)
kable(bg.sm, row.names = 1:7, caption = "Western Atlantic bluefin > 185 cm vs < 185 cm", digits = 4)

\newpage

Compare stocks and season

e.w.s = data.frame(sapply(1:4, function(x) get.mgof(watl.big.ct[[x]], eatl[[x]])))
names(e.w.s) =  names(watl.big)

# library(xtable)

# print(xtable(e.w.s, include.rownames = T), type = 'html')
kable(e.w.s, row.names = 1:7, caption = "Western Atlantic bluefin > 185 cm vs Eastern Atlantic bluefin", digits = 4)

Compare between stocks, non-seasonal

this doesn't work... need a different test if this comparison is neccesary

# get overall transition rate; multiply the seasonal matrices
b1 = watl.big
b2 = eatl

ball.w = b1[[1]]%*%b1[[2]]%*%b1[[3]]%*%b1[[4]]
ball.e = b2[[1]]%*%b2[[2]]%*%b2[[3]]%*%b2[[4]]

(e.w = get.loglin(ball.e, ball.w))

\newpage

Tradeoff between degrees of freedom and likelihood ratio within the $\chi^{2}$ test

this helped answer a question I had earlier

plot(pchisq(rep(50,100), df = 1:100), typ = 'l', lwd= 2, col = 4, xlab = 'DF', ylab = 'Value')
title('pchisq value')


galuardi/SatTagSim documentation built on Nov. 15, 2020, 6:28 a.m.