#' @export
#'
#' @title F.bootstrap.passage
#'
#' @description Bootstrap or Monte-Carlo simulate data sufficient to compute
#' confidence intervals for passage.
#'
#' @param grand.df A data frame containing both daily estimated passage and
#' efficiency, for each trap.
#' @param catch.fits A list of Poisson fitted \code{glm} objects for each trap,
#' possibly with basis spline covariates, used to impute missing catches.
#' @param catch.Xmiss A list containing a spline basis matrix of imputed days
#' where catch is missing for each trap.
#' @param catch.gapLens A list, for each trap, containing a numeric vector of
#' hours of time spent \code{"Not fishing"}, originating from variable
#' \code{TrapStatus} in original catch data queries. All values necessarily
#' have entries less than 24.
#' @param catch.bDates.miss A list containing a POSIX vector of \code{"Not fishing"}
#' \code{batchDate}s for missing catches, for each trap. Necessary because
#' one \code{batchDate} may have two (or more) periods with no fishing.
#' @param eff.fits A list of binomial logistic regression fitted objects used to
#' compute efficiency. One per trap.
#' @param eff.X A list containing the basis matrix associated with each
#' efficiency-spline model for each trap. These matrices originate from use
#' of function \code{bs} in function \code{F.efficiency.model}.
#' @param eff.ind.inside A list containing the first and last day of trapping
#' for each trap.
#' @param eff.X.dates A list containing the dates for which missing efficiency
#' must be estimated, for each trap.
#' @param eff.X.obs.data A list containing the raw observed efficiency data used
#' to fit efficiency models and estimate bias-corrected efficiencies.
#' @param eff.type A list containing the type of efficiency model utilized for
#' each trap. See \code{eff_model.r} for details.
#' @param sum.by A text string indicating the temporal unit over which daily
#' estimated catch is to be summarized. Can be one of \code{day},
#' \code{week}, \code{month}, \code{year}.
#' @param R An integer specifying the number of Monte Carlo iterations to do.
#' @param ci A logical indicating if 95\% bootstrapped confidence intervals
#' should be estimated along with passage estimates.
#'
#' @return A data frame containing 95\% bias-adjusted confidence intervals for
#' all unique temporal units summarized via specification of \code{sum.by}.
#'
#' @details In order to bootstrap the estimated passage for a particular trap,
#' random realizations of passage must be generated. Variability in passage
#' can originate from two sources: imputed catch and imputed efficiency.
#' Imputed catch originates from periods of \code{"Not fishing"} in excess of two
#' hours, while imputed efficiency results from days between the first and
#' last day of a recorded efficiency trial. Any one day may lead to several
#' instances of imputed catch, but at most, only one instance of imputed
#' efficiency. Since days of operation varies over different traps, the
#' imputation periods vary as well.
#'
#' Bootstrapping of each of catch and efficiency is organized via matrices of
#' dimension \eqn{\code{nrow(grand.df)} \times \code{R}}, where rows hold unique
#' trapping instances, and the columns the bootstrapping replicates. Because
#' efficiency is only estimated on a per-day basis, but multiple trapping
#' instances can take place on any one day, catch data are summarized per day
#' following initial bootstrap sampling, with corresponding multiple intra-day
#' replicates summed. Imputed values within each of the resulting daily catch
#' and efficiency matrices thus contribute to underlying stochastic
#' variability.
#'
#' For each trap following sampling completion, the catch matrix is divided by
#' the efficiency matrix, where the \eqn{(i,j)}th entry of the resulting
#' passage sampling matrix corresponds to the \eqn{j}th passage replicate of
#' the \eqn{i}th day. These daily passage estimates, over each replicate, are
#' then summarized via function \code{summarize.passage} over the temporal
#' unit specified via \code{sum.by}. In this way, \code{R} samples for each
#' unique temporal time unit within the date range of \code{grand.df} are
#' obtained.
#'
#' Given the \code{R} replicates for each unique time period, 95\%
#' bias-corrected confidence intervals are obtained. These confidence
#' intervals correct for non-symmetric passage replicates.
#'
#' @section Variance Matrices: Catch models are fit via a Poisson generalized
#' linear model. Often, these models are overdispersed, with a large Pearson
#' overdispersion parameter, relative to one. Catch, however, often has a
#' much higher-than-expected variance, due to seasonal fish pulses. To
#' account for outliers in this case, the largest and smallest 20% of Pearson
#' residuals are removed, and the dispersion statistic recalculated. If
#' instead, the dispersion statistic is less than one, it is set to one.
#'
#' The modified overdispersion statistic is then multiplied by the
#' variance-covariance matrix of the original model-fit; in this way,
#' standard errors are recalculated via a modified quasililelihood approach.
#'
#' Efficiency models are fit via a binomial generalized linear model. As a
#' discrete model, these also can be overdispersed. However, the efficiency
#' trial data are generally sparse. As a result, instead of removing the top
#' proportion of residuals greater than some percentile magnitude, those
#' greater than an absolute cut-off are removed instead. Here, any Pearson
#' residual with an absolute value greater than 8 are removed. Following the
#' removal of all extreme residuals, the resulting overdispersion is then
#' calculated, and then applied to the variance-covariance of the original
#' binomial fit. The resulting dispersion statistic is set to one in case it
#' calculates as less than one. Traps with one efficiency trial also have
#' overdispersions set to one. Similar to the variance adjustment applied to
#' catch, this is a modified quasilikelihood approach.
#'
#' In the case when there are less than ten observed efficiency trials for one
#' trap, a bias-corrected efficiency is calculated in lieu of a model fit.
#' This efficiency is calculated simply as the sum of the \code{nCaught} fish
#' plus one, divided by the sum of the \code{nReleased} fish plus one. The
#' plus-one manipulation prevents the direct estimation of variance via a
#' formal generalized linear (or additive) model. In this case, bootstrap
#' samples originate from a multivariate distribution with mean equal to the
#' bias-corrected efficiency, and a variance equal to the traditional
#' generalized linear model (glm) variance, but with back-transformed
#' model-derived estimates of observed efficiencies replaced with their
#' bias-corrected equivalents. See McCulloch and Searle (2001), or any other
#' mathetmatical treatment of the generalized linear model, for details on the
#' glm variance.
#'
#' @section Random Realizations: Catch fit models are utilized to generate
#' random realizations of catch for each individual trap. To do this, we
#' use the \code{mvtnorm::rmvnorm} function to randomly sample from a
#' multivariate normal
#' distribution, with dimension equal to the number of \eqn{\beta}
#' coefficients utilized in the trap's catch model.
#' The \code{mvtnorm::rmvnorm} routine uses
#' the vector of model coefficients as the mean of the multivariate
#' distribution and the the modified
#' quasilikelihood variance-covariance matrix for the variance. To speed
#' calculations, we use the Cholesky matrix decomposition to calculate
#' the variance matrix root. All betas and variances are on the log scale
#' because the Poisson catch models assume a log link.
#'
#' After generation, we use each of the \code{R} multivariate-normal samples
#' to create
#' a new prediction for missing catches. Random missing catches are
#' expanded by the log of the
#' trap down-time, i.e., trap down-time or gap in fishing is an offset.
#' We expatiate the resulting imputed catch predictions
#' and then combined with the
#' observed catch to create a dataset containing a catch record
#' for every day of the season.
#'
#' @references Manly, B. F. J. Randomization, Bootstrap and Monte Carlo Methods
#' in Biology, Third Edition, 2006. Chapman and Hall/CRC.
#'
#' McCulloch, C. E. and Searle, S. R. Generalized, Linear, and Mixed Models,
#' 2001. Wiley Interscience.
#'
#' @seealso \code{F.est.catch}, \code{F.est.eff}, \code{F.summarize.passage}, \code{F.efficiency.model}
#'
#' @author WEST Inc.
#'
F.bootstrap.passage <- function( grand.df, catch.fits, catch.Xmiss, catch.gapLens, catch.bDates.miss, eff.fits, eff.X, eff.ind.inside, eff.X.dates, eff.X.obs.data, eff.type, sum.by, R, ci=T ){
# grand.df <- grand.df
# catch.fits <- catch.and.fits$fits
# catch.Xmiss <- catch.and.fits$X.miss
# catch.gapLens <- catch.and.fits$gaps
# catch.bDates.miss <- catch.and.fits$bDates.miss
# eff.fits <- eff.and.fits$fits
# eff.X <- eff.and.fits$X
# eff.ind.inside <- eff.and.fits$ind.inside
# eff.X.dates <- eff.and.fits$X.dates
# eff.X.obs.data <- eff.and.fits$obs.data
# eff.type <- eff.and.fits$eff.type
# sum.by <- summarize.by
# R <- 100
# ci <- T
# ---- Obtain necessary variables from the global environment.
bootstrap.CI.fx <- get("bootstrap.CI.fx",envir=.GlobalEnv)
# ---- Set the confidence level of the intervals.
conf <- 0.95
# ---- Get Julian weeks for this timeframe.
if(sum.by == 'week'){
# ---- Obtain information from the global environment.
min.date <- get("min.date",envir=.GlobalEnv)
max.date <- get("max.date",envir=.GlobalEnv)
# ---- Obtain Julian dates so days can be mapped to specialized Julian weeks.
db <- get( "db.file", envir=.GlobalEnv )
ch <- odbcConnectAccess(db)
JDates <- sqlFetch( ch, "Dates" )
close(ch)
# ---- Clean up the Julian week information.
theDates <- data.frame(Date=as.Date(seq(as.Date(attr(grand.df,"min.date"),format="%Y-%m-%d"),as.Date(attr(grand.df,"max.date"),format="%Y-%m-%d"),by="days")))
JDates$Year <- as.numeric(format(JDates$uniqueDate,"%Y"))
JDates$Date <- as.Date(JDates$uniqueDate,format="%Y-%m-%d")
theDates <- merge(theDates,JDates[,c('Date','julianWeek','Year','julianWeekLabel')],by=c('Date'))
names(theDates)[names(theDates) == 'julianWeek'] <- 'JWeek'
jDates <- JDates[as.Date(JDates$uniqueDate) >= min.date & as.Date(JDates$uniqueDate) <= max.date,]
jDates <- jDates[,c('uniqueDate','year','julianWeek','julianWeekLabel')]
#jDates <- subset(JDates, as.Date(uniqueDate) >= min.date & as.Date(uniqueDate) <= max.date,c(uniqueDate,year,julianWeek,julianWeekLabel))
attr(grand.df,"JDates") <- jDates
}
# ---- Compute THE estimate. F.summarize.passage first averages over traps,
# ---- then sums by sum.by. When this returns, n.orig is a data frame with columns
# ---- s.by, passage, date, and pct.imputed.catch.
n.orig <- F.summarize.passage( grand.df, sum.by )
cat("back from summarize\n")
# ---- If confidence intervals are called for, do them; otherwise return.
n.len <- nrow(n.orig)
n.grand.df <- nrow(grand.df)
if( !ci ){
na <- rep(NA, n.len)
ans <- data.frame(l = na, u= na)
correct <- NA
} else {
n.traps <- length(catch.fits)
# ---- These giant matrices will hold bootstrap iterations.
c.pred <- matrix( grand.df$totalEstimatedCatch, nrow=nrow(grand.df), ncol=R )
e.pred <- matrix( grand.df$efficiency, nrow=nrow(grand.df), ncol=R )
cat(paste("n.traps=", n.traps, "\n"))
cat(paste("n.len=", n.len, "\n"))
# ---- Main iteration loop (over traps).
for(trap in 1:n.traps){
trapID <- names(catch.fits)[trap]
trap.ind <- grand.df$trapPositionID == trapID
cat(paste("trap=", trapID, "\n" ))
# ---- Generate random realization of catch, where required, i.e., for
# ---- days (time periods) requiring imputation.
ind <- which(trapID == names(catch.fits))
if( length(ind) > 0 ){
c.fit <- catch.fits[[ind]]
X <- catch.Xmiss[[ind]]
gaps <- catch.gapLens[[ind]]
bd.miss <- catch.bDates.miss[[trap]]
cat("in bootstrap_passage.r (hit return)...")
if( all(!is.na(bd.miss)) ){
# ---- We have some gaps.
# ---- Estimate an overdispersion parameter.
disp <- overDphi(model=c.fit,family="poisson",type="pearson")
# ---- Function vcov returns unscaled variance-covariance matrix, so scale by overdispersion.
sig <- disp * vcov( c.fit )
cat(paste("...Poisson over-dispersion in catch model for trap ", trapID, " = ", disp, "\n"))
cat("in bootstrap_passage.r (hit return)...")
# ---- Coefficients.
beta <- coef( c.fit )
# ---- If there is only one coefficient, then number of non-zero catches is <= 10.
# ---- It is possible for number of non-zero catches to be 0 (they never caught anything)
# ---- In the latter case, coefficient is large negative and causes estimates to blow.
# ---- Trap this situation. The correct estimate in this case is 0.
# ---- Not sure the trapped scenario can still happen, given that we now get rid of
# ---- precedent and antecedent zeros. We specifically exclude zero-catch traps from
# ---- being estimated.
if( (length(beta) == 1) & (beta[1] < -10)){
rbeta <- matrix( -10, nrow=R, ncol=1 )
} else {
# ---- Generate random coefficients.
rbeta <- mvtnorm::rmvnorm(n=R, mean=beta, sigma=sig, method="chol") # R random realizations of the beta vector. rbeta is R X (n coef)
}
# ---- Predict catches using random coefficients.
# ---- When computed, pred is a matrix where each column is a random realization of model
# ---- predictions for missing catches. There are R columns (sets of predictions).
pred <- X %*% t(rbeta) + log(gaps)
pred <- exp(pred) # ---- Pred is nrow(X) X R.
# plot(seq(1,nrow(pred)),pred[,1])
# apply(pred,2,function(x) lines(seq(1,nrow(pred)),x))
# ---- But remember, it is possible for a gap to be small, say 3 hours, and the resulting gap be
# ---- on the same batch date as another. So, we need to sum over batch dates. Do this for every column.
pred <- apply( pred, 2, function(x,bd){tapply(x,bd,sum)}, bd=bd.miss )
pred <- matrix( unlist(pred), nrow=length(unique(bd.miss)), ncol=R )
# ---- Make catch matrix the correct size by including the observed counts
ind.mat <- matrix( trap.ind & (grand.df$imputed.catch > 0), nrow=n.grand.df, ncol=R )
# ---- Replaces imputed values with other realizations contained in pred. This is
# ---- nrow(grand.df) X R, or (n.batch.days*n.traps) X R.
c.pred[ind.mat] <- pred
}
}
# ---- Generate random realizations of efficiency. We utilize the non-decimal trapID, since we never fit
# ---- efficiency on the decimal traps alone. This maps decimal traps of catch to non-decimal traps of
# ---- efficiency.
ind <- which( as.character(round(as.numeric(trapID),0)) == names(eff.fits) )
if( length(ind) > 0 ){
e.fit <- eff.fits[[ind]]
e.X <- eff.X[[ind]]
eff.obs.data <- eff.X.obs.data[[ind]]
e.type <- eff.type[[ind]]
# ---- The e.X design matrix has columns in an order that was convenient in eff_model.r. This
# ---- order needs to be checked, to ensure alignment with the order in e.fit. It is easier to
# ---- manipulate the column order in e.X (one matrix), than all the stuff in e.fit (a list).
if( (length(e.X) > 1) & !is.na(e.type) ){ # <--- Exclude cases when e.X == NA.
if(ncol(e.X) > 1 & e.type == 5){
cat(paste0("Sorting variables ",colnames(e.X)," for e.X in bootstrap.\n"))
timeVar <- sort(colnames(e.X)[grepl("time",colnames(e.X),fixed=TRUE)])
fit.Vars <- names(coef(e.fit))
notTimeVar <- fit.Vars[!(grepl("tmp.bs",fit.Vars,fixed=TRUE))]
notTimeVar <- notTimeVar[notTimeVar != "(Intercept)"]
thisOrder <- c("Intercept",timeVar,notTimeVar)
e.X <- e.X[,thisOrder]
}
}
# ---- This is a 2-vector of the first and last efficiency trials.
e.ind <- eff.ind.inside[[ind]]
# ---- This is an indicator for days inside the efficiency season.
e.ind <- (e.ind[1] <= grand.df$batchDate) & (grand.df$batchDate <= e.ind[2]) & trap.ind
# ---- The vector of dates inside efficiency season. This is used to
# ---- line up with catches because catch seasons vary.
e.dts <- eff.X.dates[[ind]]
# ---- Check if there are no efficiency trials. Not sure what this is enh eff framework.
if( (!is.list(e.fit) & e.type != 5) | length(e.fit) == 0 | (e.type == 5 & length(e.fit) == 0) ){
# ---- In this case, we have NA for efficiency. Couldn't fit enhanced efficiency model,
# ---- probably because we lacked a covariate. If there were additionally no efficiency
# ---- trials for the min.date and max.date period, there is not much we can do.
# ---- The commented out code below puts NA for ALL ROWS in e.pred. This is not correct.
# ---- Update 6/18/2018: we also end up here if we put in a fake release.df row in run_passage.
# ---- While eff.obs.data has one fake row of data, it really shouldn't be there.
#e.pred <- matrix( NA, nrow(c.pred), ncol(c.pred) )
} else {
# ---- Variance matrix,
# ---- Check if less than eff.min.spline.samp.size.
if( e.fit$df.residual == 0 | nrow(e.fit$data) < 10 ){
disp <- 1
} else {
# ---- Estimate an overdispersion parameter.
disp <- overDphi(model=e.fit,family="binomial",type="pearson")
}
cat(paste0("...Binomial over-dispersion in efficiency model for trap ",as.character(round(as.numeric(trapID),0))," = ",disp, ".\n"))
# ---- Function vcov returns unscaled variance-covariance matrix.
# ---- Scale by overdispersion.
sig <- disp * vcov( e.fit )
# ---- Coefficients.
beta <- coef( e.fit )
# ---- Generate R random coefficients of the beta vector.
# ---- Addition of bias-corrected efficiency complicates things.
if( length(coef(e.fit)) == 1 ){
# ---- Reproduce the variance with our bias-corrected estimate of the mean efficiency.
# ---- Construct, via the general expression for a generalized linear model's variance,
# ---- the number needed, assuming a logit link. McCulloch, C.E. & Searle, S. R.
# ---- Generalized, Linear, and Mixed Models, p. 147. Note we also use an adjusted
# ---- mean that incorporates the bias.
# # ---- Example from the above book, p. 147.
# egWeight <- c(5,5,5,5,5,5,5,5,5,5)
# egY <- c(0,0,2,2,3,4,5,5,5,5)
# egConc <- c(1/128,1/64,1/32,1/16,1/8,1/4,1/2,1,2,4)
#
# # ---- CAMP RST style.
# eg <- glm(egY / egWeight ~ 1 + log(egConc), family=binomial, weights=egWeight )
# vcov(eg)
#
# # ---- Emulate our intercept-only model.
# eg2 <- glm(egY / egWeight ~ 1, family=binomial, weights=egWeight )
# vcov(eg2)
#
# # ---- Manipulate the dispersion.
# resids <- residuals(eg2, type="pearson")
# toss.ind <- (abs(resids) > 8)
# resids <- resids[!toss.ind]
# disp <- sum( resids*resids ) / (e.fit$df.residual - sum(toss.ind))
# if( disp < 1.0 ){
# disp <- 1.0
# }
#
# # ---- reproduce our below.
# X <- rep(1,length(egWeight))
# p <- (sum(egY) + 0) / (sum(egWeight) + 0)
# w <- egWeight*rep(p*(1 - p),length(X))
#
# diagonal <- matrix(rep(0,length(X)*length(X)),length(X),length(X))
# for(i in 1:length(X)){
# diagonal[i,i] <- w[i]
# }
#
# # ---- This "matrix" is 1x1 here by design...only doing this for intercept-only models.
# sig <- as.matrix(disp*( solve(t(X) %*% diagonal %*% X) ))
if(e.type == 5){
# ---- Do the bias adjustment based on the data that went into the enh eff estimation.
X <- rep(1,length(e.X))
p <- (sum(e.fit$data$nCaught) + 1) / (sum(e.fit$data$nReleased) + 1)
w <- e.fit$data$nReleased*rep(p*(1 - p),length(X))
diagonal <- matrix(rep(0,length(X)*length(X)),length(X),length(X))
for(i in 1:length(X)){
diagonal[i,i] <- w[i]
}
# ---- This "matrix" is 1x1 here by design...only doing this for intercept-only models.
sig <- as.matrix(disp*( solve(t(X) %*% diagonal %*% X) ))
rbeta <- mvtnorm::rmvnorm(n=R, mean=log(p/(1-p)),sigma=sig,method="chol")
} else {
X <- rep(1,length(eff.obs.data$nCaught))
p <- (sum(eff.obs.data$nCaught) + 1) / (sum(eff.obs.data$nReleased) + 1) # What if we have a fake release.df row, because we had no trials?
w <- eff.obs.data$nReleased*rep(p*(1 - p),length(X))
diagonal <- matrix(rep(0,length(X)*length(X)),length(X),length(X))
for(i in 1:length(X)){
diagonal[i,i] <- w[i]
}
# ---- This "matrix" is 1x1 here by design...only doing this for intercept-only models.
sig <- as.matrix(disp*( solve(t(X) %*% diagonal %*% X) ))
rbeta <- mvtnorm::rmvnorm(n=R, mean=log(p/(1-p)),sigma=sig,method="chol")
}
# # ---- Bootstrap on the observed efficiency trials.
# bsDF <- lapply(1:100000, function(x) eff.obs.data[sample(seq(1:length(eff.obs.data$nReleased)),replace=TRUE),])
# bsp <- sapply(bsDF, function(x) (sum(x$nCaught) + 1) / (sum(x$nReleased) + 1))
#
# hist(bsp)
# bspMean <- mean(bsp)
# bspMedian <- median(bsp)
# bspVar <- var(bsp)*this is wrong
} else {
rbeta <- mvtnorm::rmvnorm(n=R, mean=beta, sigma=sig, method="chol")
}
# ---- Predict efficiency using random coefficients
# ---- When computed, pred is a matrix where each column is a random realization of model predictions
# ---- for missing catches. There are R columns (sets of predictions).
pred <- (e.X %*% t(rbeta))
pred <- 1 / (1 + exp(-pred)) # Pred is nrow(e.X) X R.
# plot(seq(1,nrow(pred)),pred[,1])
# apply(pred,2,function(x) lines(seq(1,nrow(pred)),x))
# ---- Use mean-predicted efficiency for times outside first and last trials.
ind.mat <- matrix( trap.ind, nrow=n.grand.df, ncol=R )
e.means <- matrix( colMeans( pred ), byrow=T, nrow=sum(trap.ind), ncol=R )
# ---- This is complicated, but we have to line up the catch dates with
# ---- the efficiency dates. Because length of seasons vary, this is necessary.
df.c <- data.frame(batchDate=format(grand.df$batchDate[trap.ind]), in.catch = TRUE, stringsAsFactors=FALSE )
df.e <- data.frame(batchDate=format(e.dts), in.eff = TRUE, stringsAsFactors=FALSE )
df.ce <- merge( df.c, df.e, all.x=TRUE )
df.ec <- merge( df.e, df.c, all.x=TRUE )
df.ec$in.catch[ is.na(df.ec$in.catch) ] <- FALSE
df.ce$in.eff[ is.na(df.ce$in.eff) ] <- FALSE
# ---- Predictions that are in the catch data set,
pred <- pred[ df.ec$in.catch, ]
# ---- Predictions inside the season, on the right dates,
e.means[ df.ce$in.eff ] <- pred
# ---- Assign mean outside of eff.ind.inside[[trap]], and efficiency model inside season.
e.pred[ind.mat] <- e.means
}
}
cat("...BS complete\n")
}
# Utilize this construction to avoid NOTEs about assigning variables to the
# .GlobalEnv when running devtools::check().
pos <- 1
envir <- as.environment(pos)
assign("c.pred", c.pred, pos=envir)
assign("e.pred", e.pred, pos=envir)
# ---- Estimate passage.
# ---- Matrices c.pred and e.pred are the same size, so just divide.
test <- ifelse(grand.df$imputed.catch > 0 & grand.df$imputed.catch < 1,grand.df$totalEstimatedCatch - grand.df$imputedCatch,0)
c.pred <- apply(c.pred,2,function(x) x + test)
c.pred2 <- c.pred / e.pred # Re-use the c.pred matrix to save space
# grand.df[as.Date(grand.df$batchDate) == "2013-04-12",]
# c.pred2[c(79,229,300),]
# e.pred[c(79,229,300),]
# ---- Now, average over traps
# ---- At this point, c.pred is a (n.batch.day*n.trap) X R matrix, with each cell containing the passage estimated
# ---- at a particular trap at the site for a particular batch day for a particular iteration of the bootstrap.
# ---- Row dimension of list items corresponds to (batch days x trap), columns correspond to iterations.
# ---- We now need to average the cells over the traps, and summarize by time. Do this by calling
# ---- F.summarize.passage on each column.
# ---- Get Julian week information in the case of weeks. Only adds a couple of seconds, so don't check if
# ---- the temporal summary is by week; just do it.
db <- get( "db.file", envir=.GlobalEnv )
ch <- odbcConnectAccess(db)
JDates <- sqlFetch( ch, "Dates" )
close(ch)
JDates$uniqueDate <- as.Date(JDates$uniqueDate,format="%Y-%m-%d")
JDates <- JDates[JDates$uniqueDate >= attr(grand.df,"min.date") & JDates$uniqueDate <= attr(grand.df,"max.date"),]
# ---- Internal function to summarize catch by s.by by applying
# ---- F.summarize to every column of pass.
f.sumize.pass <- function(p, s.by, bd){#}, imp.catch){
# p <- c.pred
df <- data.frame( batchDate=bd, passage=p, imputed.catch=1 )
# ---- Attach JDates if we need to bootstrap by week.
attr(df,"JDates") <- JDates
n <- F.summarize.passage( df, s.by )
n$passage
}
pass <- apply( c.pred2, 2, f.sumize.pass, s.by=sum.by, bd=grand.df$batchDate)
pass <- matrix( unlist(pass), n.len, R )
# ---- Compute bias corrected bootstrap CIs by applying
# ---- f.bias.bs.ci to every row of pass to get bootstrap intervals.
f.bias.acc.ci <- function( x, alpha, x.orig ){
# x <- pass[1,]
# alpha <- 1 - conf
# x.orig <- n.orig
p <- mean( x > rep(x.orig$passage,R), na.rm=TRUE)
z.0 <- qnorm( 1 - p )
z.alpha <- qnorm( 1 - (alpha/2))
p.L <- pnorm( 2*z.0 - z.alpha )
p.H <- pnorm( 2*z.0 + z.alpha )
ci <- quantile( x[ !is.na(x) & (x < Inf) ], p=c(p.L, p.H) )
ci
}
# ---- Compute "regular" boostrap CIs, based on defined conf.
f.ci <- function( x, alpha, x.orig ){
# x <- pass[1,]
# alpha <- 1 - conf
# x.orig <- n.orig
ci <- quantile( x[ !is.na(x) & (x < Inf) ], p=c(alpha/2, 1-(alpha/2)) )
ci
}
# ---- Get (1 - alpha)% confidence bounds via GlobalVars bootstrap.CI.fx.
ans <- tryCatch({
apply( pass, 1, get(bootstrap.CI.fx), alpha=(1-conf), x.orig=n.orig )
},
error=function(cond) {
stop("I don't see a value for GlobalVars function bootstrap.CI.fx. Examine GlobalVars() for valid value(s).\n")
# message(cond)
return(NA)
})
ans <- as.data.frame(t(matrix( unlist(ans), 2, n.len )))
# lookie <- data.frame(passage=n.orig$passage,LCL=ans$V1,UCL=ans$V2)
# maxY <- max(lookie)
# days <- seq(1,nrow(lookie))
#
# png("L:/PSMFC_CampRST/Support/bootStrapFun/lookie.png",res=600,height=40,width=12,units="in")
# plot(days,lookie$passage,col="blue",ylim=c(0,maxY),type="l")
# polygon(c(days,rev(days)),c(lookie$UCL,rev(lookie$LCL)),col="skyblue",border="skyblue")
# lines(days,lookie$passage,col="blue",ylim=c(0,maxY),type="l",lwd=0.25)
# dev.off()
# ---- Estimate standard deviation of the mean (error of X).
correct <- apply(pass,1,function(x) sd(x))
}
# approx <- (ans$V2 - ans$V1)/4
# cor(correct,approx)
# plot(correct,approx)
# ---- Append lower and upper end points and return
names(ans) <- paste0( c("lower.", "upper."), conf*100 )
ans <- data.frame( n.orig, ans, error=correct, stringsAsFactors=F )
ans
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.