library(knitr)
library(jpeg)
library(biosplit)
knitr::opts_chunk$set(echo=FALSE, comment='>', dpi=600)
realWd <- system.file(package = "biosplit", "docs")
figList <- list()
source(paste(realWd,"rmdUtils.R",sep='/'))
naiveOffset <- function(popNum, intDivide = 7){
    return(round(1 - (2*log10(intDivide) / (log10(popNum) + log10(popNum))), 2))
}

This report will compare these runs:

if(is.character(niceGrid)){
    stop(paste("Supply a raster with the final projection in 'niceGrid'"))
}

#runs <- c('runDataRedo','runHelloWorld','runInterYear','runFLwinter')
#years <- c('2011','2012','2013','2014')
goldLoc <- "C:/Program Files/Golden Software/Surfer 12"
runScripter <- makeRunFun(goldLoc, "BAS")
# params <- list()
#params$runs <- c('runInterYear','runAbsFlightLimit','runNightDurDel','runFlightAndNight')
# params$runs <- c("runBaseFullNight", "runBFNTOcueTemp", "runBFNTOcuePrec")
# params$runs <- c("runBaseNightandFlight", "runFDHighLatThres","runFNHighLatThres")
# params$years <- c("2013", "2013", "2013")
rwyear <- paste(params$years, params$runs, sep='/')
# params$dirSim <- "C:/Users/Siddarta.Jairam/Documents/Hysplit Out Moth table"
fullDir <- paste(params$dirSim, rwyear,sep='/')
numRuns <- length(rwyear)
compareRuns <- 1:numRuns
cat(paste(rwyear,colllapse='|'))

The conditions that the run was done in are as follows:

condFiles <- vapply(fullDir,function(x){
    readLines(paste(x,'Conditions.txt',sep='/'))
    },rep('one',5))
cat(condFiles[3:4,2])

The differences between the different runs are as follows:

bioDiff <- findDiff(condFiles[3,], cleanLabel(rwyear), compareRuns)
simDiff <- findDiff(condFiles[4,], cleanLabel(rwyear), compareRuns)

figList <- updateCntList(figList,"bioDiff")

Table r figList$bioDiff. All the differences between the biological Assumptions between the different runs

kable(as.data.frame(bioDiff),align='c')
figList <- updateCntList(figList,"simDiff")

Table r figList$simDiff. All the differences between the parameters for the HYSPLIT execution between the different runs

kable(as.data.frame(simDiff),align='c')
figList <- updateCntList(figList,"jpgTable")
times <- c('051','053','063','073','091')
dates <- strftime(strptime(paste0(times,'0'),'%m%d'),'%b\n%d')
numTimes <- length(times)

#make a mosaic of all the images specified
conPaths <- paste(fullDir,'jpeg','Contours',sep='/')
contours <- vapply(conPaths,function(pth){
    vapply(times,function(tim){
        potJpegs <- list.files(pth,
            pattern=paste0('Moth_',tim),
            full.names = TRUE)
        return(potJpegs[!grepl('gsr2',potJpegs)][1])
        },'hi')
    },rep('hi',numTimes))

contHeight <- 6.5 * mosaicImage(contours,offsetx=.08,marb=.2,marl=.18, onlyScale = TRUE)

Here is the contour plots of the different runs at different times of the year:

Table r figList$jpgTable. Contour plots of instaneous snapshots of the Moth population in the different runs at specific times throughout the migration time for FAW throughout late spring to fall.

mosaicImage(contours,labelx=rwyear,labely=dates,offsetx=.08,marb=.2,marl=.18)

There are 5 main areas of mixing that are of interest to this project; Southern Mississippi and Alabama along the gulf (Southern MSAL), Northern Mississippi and Alabama near the tail end of the Appalachian mountains (Northern MSAL), Eastern PA near the eastern seaboard (Eastern PA), Central Pennsylvania in the plains region (Central PA), and Western Pennsylvania near Lake Erie (Western PA). One of the analysis that will be used in this project is the first occurrence of a population in that area (firstOcc). For this use the defined area is the area of interest where most other times it will be done per grid cell. The other tool will be the mixing ratio of the two populations in the area of interest. This is defined as follows:

$$Mix_{ratio} =\frac{\log_{10} (FLMoth + 1) - \log_{10} (TXMoth + 1)}{\log_{10} (FLMoth + 1) + \log_{10} (TXMoth + 1)}$$

This is done to normalize the output on a range of -1 (All TX moths) to 1 (all FL moths). Also this puts it on a similar scale as the Haplotype ratio h2/h4 which goes from 0 (All TX) to > 2 (mostly FL) with the mixing region between 0.5 to 1.5.

For the moth populations there are 2 different datasets that are produced, the Sum which is a summation for that whole week, or Snap, which is a snapshot of the moths at the end of the week. By design Sum double counts a population if they are there for multiple days in a week. The thinking behind doing this was that the validation data is on a trap is active over a period of time. At this point, we don't know when the moths actually entered the trap and they are all pooled at the end of the week of that trap period. The haplotype data is pooled even more aggressively, sometimes over a month. Comparing these validation datasets 1 to 1 with a per day snap shot from the model is a not fair. Using the Sum "pools" the potential population over that time which is a better comparison with the validation data.

However, this presents questions about these aggregate statistics of firstOcc and Mix. firstOcc would only query the population one out of 7 days for the presence of moths. For this test, Sum would query the simulation every day. Both values would only give an answer in the order of weeks and the difference is only going to be a week because of the spreading of the emitted population over 7 days.

Mix is much more complicated because the population studied is different. In Snap the population is still defined by raw simulation. In Sum, the population is something like "the exposure of that grid to moths given in units of moth * day." This can be explained as "the average moths in this grid over this week" by dividing by 7 days. Mix does not make that much sense on units of "moth * day" so this population should be divided by 7 each time. This does not seem like a big issue except when the calculations are carried out (assuming large values for moth counts).

$$Mix_{ratio, naive} =\frac{\log_{10} (FLMoth_{sum} + 1) - \log_{10} (TXMoth_{sum} + 1)}{\log_{10} (FLMoth_{sum} + 1) + \log_{10} (TXMoth_{sum} + 1)}$$

$$Mix_{ratio, real} =\frac{\log_{10} (\frac{FLMoth_{sum}}{7} + 1) - \log_{10} (\frac{TXMoth_{sum}}{7} + 1)}{\log_{10} (\frac{FLMoth_{sum}}{7} + 1) + \log_{10} (\frac{TXMoth_{sum}}{7} + 1)}$$

$$Mix_{ratio, real} =\frac{\log_{10} (FLMoth_{sum} + 1) - \log_{10} (7) - \log_{10} (TXMoth_{sum} + 1) + \log_{10} (7)}{\log_{10} (FLMoth_{sum} + 1) - \log_{10} (7) + \log_{10} (TXMoth_{sum} + 1) - \log_{10} (7)}$$

$$Mix_{ratio, real} =\frac{\log_{10} (FLMoth_{sum} + 1) - \log_{10} (TXMoth_{sum} + 1)}{\log_{10} (FLMoth_{sum} + 1) + \log_{10} (TXMoth_{sum} + 1) - 2*\log_{10} (7)}$$

$$Mix_{ratio, real} = Mix_{ratio, naive} * (1 - \frac{2*\log_{10} (7)}{\log_{10} (FLMoth_{sum} + 1) + \log_{10} (TXMoth_{sum} + 1)}$$

For the qualitative definition of Mix being the mixing ratio between the two strains, one would assume that a factor of 7 in both populations would cancel out. This is not the case and is actually a large factor difference. For small but equal population of 500 moths this factor is r naiveOffset(500) while for a large population of 500,000,000 moths the factor is still r naiveOffset(500000000). Now this can be dismissed as having an implicit denominator of 1 moth just to get rid of the units in the regular formula and when using the Sum population would have a denominator of 1 moth*day. This begs the question "What does this new equation mean?"

More generally, Mix should be done on real populations to preserve its meaning. Doing it on the Snap values individually is perfectly fine since they are real simulation outputs. We can assume that weekly sums does not have too much double counting due to the conditions changing over this period, the cohorts grow on this time scale and the lifetime of the moths after flight is on the order of 1 week. This allows even the summation of all the weekly Snap populations to be applicable with Mix since the population is "all the moths that enter the grid cell through the year". The problem is when something other than actual populations are passed through. This includes the Sum dataset, spatial averages of the population and other aggregate statistics.

For this example, the Moth populations in question will be all moths that enter the region summed over the entire year. For the firstOcc, the Sum population will be used, while for Mix the Snap dataset will be used. Below is the summary for each of the regions and these values.

baseArea <- "C:/Users/Siddarta.Jairam/Documents/Documentation/"
areas <- c("Southern MSAL.kml", "Northern MSAL.kml", "Western PA.kml", "Central PA.kml", "Eastern PA.kml")

regStats <- vapply(fullDir,function(f){
    vapply(areas, function(area){
        return(c(
            simAreaStats(f, paste0(baseArea, area), useSum = TRUE)[1:2],
            simAreaStats(f, paste0(baseArea, area))[[3]], recursive=TRUE))
    }, rep(1,3))

}, matrix(0, nrow = 3, ncol = length(areas)))

mixTable <- cleanAllLabels(t(regStats[3,,]))
arrFLTable <- cleanAllLabels(t(regStats[1,,]))
arrTXTable <- cleanAllLabels(t(regStats[2,,]))

arrFinal <- matrix(paste(arrFLTable, arrTXTable, sep=', '),
    nrow=nrow(arrFLTable),
    dimnames=dimnames(arrFLTable))

figList <- updateCntList(figList,"mixTable")
figList <- updateCntList(figList,"arrTable")

Table r figList$mixTable. The mixing ratio of the simulation in specific regions. The spatial range is the region and the temporal range is a summation of the whole year. The result is the average of all the moths that enter this region.

r kable(mixTable)

Table r figList$arrTable. The simulation first arrival for both populations in the specific regions. The first arrival is reported as a pair of the different populations (FL first occurance, TX first occurance) in weeks in the year.

r kable(arrFinal)

Validation

For this project we have two main validation sets:

  1. PestWatch data on trap captures across the U.S.
  2. Haplotype analysis on a smaller subset of 1.

From this data we can compare the same output variables as that was looked at in the simulation output. The week of first occurrence will be the same in the validation set. The difference is in the temporal resolution. The simulation has daily values which can easily be converted into their weekly forms. The trap captures are not so regular and often have odd temporal resolution. Farms with less than 10 captures for the entire year were thought to be untraceable amount on the fringes of the moth population's extent. There is also a distinction with what first occurrence means in this case. Should first date that the moths were found be the first occurrence date (real method) or should it be the earliest that moths could have entered the traps and recorded the same record into the system (pred method)? The real method is more conservative as it will always at least have moths at that point, but the moths probably came earlier. The pred method shows when the lower bound to the value but could very well be a week where nothing was in the traps. The solution was to allow both methods to be calculated but report the real method.

qualTable <- rbind(
    c("-1 - -0.33", "0 - 0.5", "Mostly Texas"),
    c("-0.33 - 0.33","0.5 - 1.5", "Mixed population"),
    c("0..33 - 1", "1.5 - inf", "Mostly Florida")
)
colnames(qualTable) <- c("Sim Mixing ratio", "h2/h4 ratio", "Qualitative meaning")
figList <- updateCntList(figList,"qualTable")

The genetic information is on a much smaller scale than the entire pestwatch system. Moths have to be pooled, transported and analyzed which takes time, effort and a sufficient number of moths (>15 moths). An effort was made in having samples spread out over the U.S. but southern US does not have many samples while the north-east has a rich genetic map. The genetic marker used to distinguish the Texas and Florida populations is the ratio of the h2 and h4 haplotypes. This range of possible numbers are all non-negative real numbers. To compare this to the simulation, the mixing ratio was introduced previously. These values are compared in r figList$qualTable along with their qualitative meaning of each range.

Table r figList$qualTable. The qualitative meaning of the genetic information.

r kable(qualTable)

valStats <- vapply(params$years, function(yr){
    trapRec <- scrubTrap(params$trapDump, yr)
    csvVal <- summarizeValid(trapRec, yr,
        hapDat = sprintf(params$scrubbedHap, yr),
        pathCsvOut = sprintf(params$summarizedVal, yr))
    vapply(areas, function(area){
        c(valAreaStats(csvVal, paste0(baseArea, area), params$niceGrid), recursive = TRUE)
    }, rep(1, 2))

}, matrix(0, nrow = 2, ncol = length(areas)))

dimnames(valStats)[[3]] <- params$years
valArrTable <- cleanAllLabels(t(valStats[1,,]))
valMixTable <- cleanAllLabels(t(valStats[2,,]))

figList <- updateCntList(figList,"valFirstOcc")
figList <- updateCntList(figList,"valMix")

The validation results for the defined areas of interest is shown below.

Table r figList$valMix. The mixing ratio of thetrap captures in specific regions. The spatial range is the region specified and the temporal range is the whole year. The result is the average of all the moths captured in this range.

r kable(valMixTable)

Table r figList$valFirstOcc. The PestWatch data for first arrival for either population in the specific regions. The first arrival is reported as weeks in the year.

r kable(valArrTable)

To show the exact spatial differences, the full map extent is shown below. For both the mixing and first occurrence, the contour plots are the simulation values and the points are the validation data. The shade of gray represents the mixing amount or the progression through the seasons.

mixMapPaths <- vapply(fullDir, function(f){
    pathMap <- paste(f, "SimMixRatio.nc",sep='/')
    ncdf2trapgrid(f, pathOut = pathMap, shDoMix = TRUE)
    pathHap <- sprintf(params$scrubbedHap, file2year(f))
    runScripter("ncdf2contour.BAS", c(pathMap, 3, "bw", pathHap, 3, 2, 8, 1))
    paste(f, "jpeg", "Contours", "SimMixRatio.jpg", sep = '/')
}, "e")

occMapPaths <- vapply(fullDir, function(f){
    pathMap <- paste(f, "SimFirstOcc.nc",sep='/')
    ncdf2trapgrid(f, pathOut = pathMap)
    pathSum <- sprintf(params$summarizedVal, file2year(f))
    runScripter("ncdf2contour.BAS", c(pathMap, 4, "bw", pathSum, 3, 2, 4, 0, "First Occ (PestWatch wk)"))
    paste(f, "jpeg", "Contours", "SimFirstOcc.jpg", sep = '/')
},'e')

figList <- updateCntList(figList,"mapMix")
figList <- updateCntList(figList,"mapFirstOcc")

#Add .2 cause word stuff
compWidth <- .2 + 7 / mosaicImage(mixMapPaths, offsetx=0, marb=.05, marl=.75, onlyScale = TRUE)

Table r figList$mapMix. The spatial distribution of the simulation and validation results for the relative amounts of FL and TX strains. The contours represent the simulation output and the points represent the haplotype data. On both data sets, black represents a pure TX strain and white represent a pure FL strain.

mosaicImage(mixMapPaths,labelx=rep("SimMixRatio",4), labely=rwyear,
                        offsetx=0, marb=.05, marl=.75, leftAlignYaxis = TRUE)

Table r figList$mapFirstOcc. The spatial distribution of the simulation and validation results for the first arrival of moths. The contours represent the simulation output and the points represent the PestWatch data. On both data sets, black represents an arrival before April and white represents an arrival in late October.

mosaicImage(occMapPaths,labelx=rep("SimFirstOcc",4), labely=rwyear,
                    offsetx=0, marb=.05, marl=.75, leftAlignYaxis = TRUE)


sidjai/biosplit documentation built on May 29, 2019, 9:59 p.m.