F.catch.model | R Documentation |
Compute and estimate catch for all days missing in the input data set; i.e., impute a value for catch when missing.
F.catch.model(catch.df)
catch.df |
A data frame containing all trapping data for all
|
Function F.catch.model
serves two main purposes. The first
utilizes cubic splines to fit Poisson generalized linear models to observed
catch for each trapping position over the date range provided. The second
utilizes spline results to impute estimates for periods whenever a trap was
not functioning.
A data frame with extra lines in it, when compared to input data
frame catch.df
, with extra lines due to "Not fishing"
periods, identifiable via variable TrapStatus
in data frame
catch.df
.
First, given a trap, function glm
fits a null
Poisson model of total catch with a log link. A null model is synonymous
with an intercept-only model. Log-transformed trap sampling time, in
hours, serves as an offset. Akaiake Information Criterion (AIC) measures
subsequent quality of fit.
After the initial intercept-only model, increasingly complex Poisson
log-link models are fit via glm
. More complex models require at
least ten unique trapVisitID
fishing instances to be considered.
Following the initial intercept-only model, linear, quadratic, and cubic
polynomial models are sequentially fit to the data. Assuming the data
support it (see below), cubic splines are fit following the rejection of a
cubic polynomial model. Note that a cubic polynomial model is a cubic
spline with no internal knots.
Models only consider the next complex model if four conditions are met. First, the difference in the Akaike Information Criterion (AIC), when comparing the current model to the previous model, must be greater than two, after rounding both models to four decimals.
Second, the number of unique trapping instances, divided by 15, rounded
down, must be greater than or equal to the model's degrees of freedom,
excluding an intercept. This means that a linear model, which has one
corrected degree of freedom, requires at least 15 data points. Similar
logic requires 30 unique trapping instances for a quadratic, and so on.
Global variable knotMesh
in function GlobalVars
sets the
number of unique trapping instances required for consideration of a more
complex model.
Third, resulting parameter estimates must not be on the boundary of the attainable values. Due to the log-linked models utilized here, this means that parameter estimates must not be positively or negatively infinite.
Finally, models can at most incorporate up to at most 16 degrees of freedom. This means all cubic splines with 270 or more unique trapping instances can top off with at most 13 knots, and so 14 piecewise cubic polynomials.
The table below summarizes the relationship between the number of data points, i.e., unique trapping instances, and maximal model possible. Here, "DF" represents "Degrees of Freedom." Note that all polynomial pieces must be of the same degree. Thus, give a particular catch time series, it is not possible to fit an Intercept-only model to the first half, say, and a Quadratic to the second. Both pieces must either be Intercept-only, Quadratic, or perhaps a different polynomial form.
DF | Maximal Model Type | N Trapping Instances |
0 | Intercept-only | 1 ≤ N ≤ 14 |
1 | Linear | 15 ≤ N ≤ 29 |
2 | Quadratic | 30 ≤ N ≤ 44 |
3 | Cubic | 45 ≤ N ≤ 59 |
4 | Cubic Spline with One Internal Knot | 60 ≤ N ≤ 74 |
5 | Cubic Spline with Two Internal Knots | 75 ≤ N ≤ 89 |
... | ... | ... |
k | Cubic Spline with (k - 3) Internal Knots | 15*k ≤ N ≤ 15*(k + 1) - 1 |
... | ... | ... |
16 | Cubic Spline with 13 Internal Knots | 240 ≤ N |
Models with at least 60 unique trapping instances incorporate the possibility
of a B-spline basis matrix via function bs
. This means that piecewise
polynomials are utilized to fit observed trends, with one piece covering a
particular subset in the date range covered by trapping. The points covered
by one polynomial piece correspond to quantiles in the temporal range.
Each piecewise polynomial is at most a cubic polynomial such that the end point of one piece connects with the start point of the next. Additionally, both first- and second-order derivatives are equal; thus, resulting splines, which may be composed of several individual polynomial pieces, appear smooth over their entire sample range with respect to their local slope (first derivative condition) and their local convexity (second derivative condition).
Parameter df
, or the model degrees of freedom in bs
, determines
the number of internal knots utilized. The value of df
corresponds to
the values in the Table above. Function bs
makes no consideration of
model intercept; thus all glm
-fit Poisson models utilize an
additional overall intercept. This serves to vertically center models along
the outcome axis.
The trap-specific imputation procedure utilizes the
final catch spline result obtained via the process described above.
Specifically, it sweeps through all temporally sorted rows of the catch
dataframe for the trap of interest, replacing all instances of "Not
fishing"
in variable TrapStatus
with spline-estimated fish. All
estimates loop over periods of "Not fishing"
one at a time,
predicting catch for a maximum of up to 24 hours. All "Not fishing"
periods estimate on hours, in tandem with the temporal unit utilized in
Poisson model offsets.
One extra line is inserted into catch.df
for each unique 24-hour
"Not fishing"
period larger than global variable max.ok.gap
.
Currently, max.ok.gap
is set at 2 hours in function
GlobalVars
. Thus, catch is not estimated for individual "Not
fishing"
episodes of duration less than two hours. In these cases, the
most immediately preceding valid fishing period subsumes the
sampleMinutes
associated with these small time frames.
For example, for a 56-hour period of "Not fishing"
, predictions
occur for each unique 24-hour period, with catch estimated proportionally
for any "leftover" preceding and antecedent times. Assuming that a
"Not fishing"
period coincides with the start of a day, three
resulting rows would be inserted into catch.df
– two for the first
two 24-hour periods, and a third for the leftover 8-hour period. The
leftover 8-hour period would necessarily impute one-third the number of
fish specified by that trap's catch model for that day. This number would
then be added to the observed catch for that day, obtained over the
remaining valid fishing of 16 hours. The sum of the imputed and observed
catch comprises the total catch for that day.
The StartTime
and EndTime
variables for each of the new lines
inserted into catch.df
are defined so that no "Not fishing"
periods remain. For these lines, variable gamEstimated
is set to
TRUE
. Assignment of variable batchDate
is based on
EndTime
, as usual. This methodology applies for all days between the
time period requested via min.date
and max.date
in associated
passage functions, for each unique trap trapPositionID
in
catch.df
.
If catch.df
contains no periods of "Not fishing"
, no
imputation is performed.
Starting with campR
version 1.0.0,
unassigned fish could be partitioned into decimal fractions during the
plus-count routine. This leads to catch values may have decimal values,
with the number of values after the decimal dictated by global variable
unassd.sig.digit
in GlobalVar.r
. Usually, this value is set
to 1
. However, the use of decimal fish in Poisson-fitting
algorithms prevents calculation of the AIC, since the functions utilized to
calculate its likelihood assume integer outcome data. To get around this,
the loglikelihood is reconstructed; to estimate the value of log(n!)
inherent to the calculation, the method of Nemes (2007) is used.
WEST Inc.
Nemes, G. (2010) "New asymptotic expansion for the Gamma function", Archiv der Mathematik, 95 (2): 161-169.
F.efficiency.model
## Not run: # ---- Fit splines and impute for missing data for each unique # ---- trapPositionID in data frame catch.df. fitCatch <- F.catch.model(catch.df) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.