F.catch.model: F.catch.model

F.catch.modelR Documentation

F.catch.model

Description

Compute and estimate catch for all days missing in the input data set; i.e., impute a value for catch when missing.

Usage

F.catch.model(catch.df)

Arguments

catch.df

A data frame containing all trapping data for all trapPositionIDs of interest.

Details

Function F.catch.model serves two main purposes. The first utilizes cubic splines to fit Poisson generalized linear models to observed catch for each trapping position over the date range provided. The second utilizes spline results to impute estimates for periods whenever a trap was not functioning.

Value

A data frame with extra lines in it, when compared to input data frame catch.df, with extra lines due to "Not fishing" periods, identifiable via variable TrapStatus in data frame catch.df.

Cubic splines

First, given a trap, function glm fits a null Poisson model of total catch with a log link. A null model is synonymous with an intercept-only model. Log-transformed trap sampling time, in hours, serves as an offset. Akaiake Information Criterion (AIC) measures subsequent quality of fit.

After the initial intercept-only model, increasingly complex Poisson log-link models are fit via glm. More complex models require at least ten unique trapVisitID fishing instances to be considered. Following the initial intercept-only model, linear, quadratic, and cubic polynomial models are sequentially fit to the data. Assuming the data support it (see below), cubic splines are fit following the rejection of a cubic polynomial model. Note that a cubic polynomial model is a cubic spline with no internal knots.

Models only consider the next complex model if four conditions are met. First, the difference in the Akaike Information Criterion (AIC), when comparing the current model to the previous model, must be greater than two, after rounding both models to four decimals.

Second, the number of unique trapping instances, divided by 15, rounded down, must be greater than or equal to the model's degrees of freedom, excluding an intercept. This means that a linear model, which has one corrected degree of freedom, requires at least 15 data points. Similar logic requires 30 unique trapping instances for a quadratic, and so on. Global variable knotMesh in function GlobalVars sets the number of unique trapping instances required for consideration of a more complex model.

Third, resulting parameter estimates must not be on the boundary of the attainable values. Due to the log-linked models utilized here, this means that parameter estimates must not be positively or negatively infinite.

Finally, models can at most incorporate up to at most 16 degrees of freedom. This means all cubic splines with 270 or more unique trapping instances can top off with at most 13 knots, and so 14 piecewise cubic polynomials.

The table below summarizes the relationship between the number of data points, i.e., unique trapping instances, and maximal model possible. Here, "DF" represents "Degrees of Freedom." Note that all polynomial pieces must be of the same degree. Thus, give a particular catch time series, it is not possible to fit an Intercept-only model to the first half, say, and a Quadratic to the second. Both pieces must either be Intercept-only, Quadratic, or perhaps a different polynomial form.

DF Maximal Model Type N Trapping Instances
0 Intercept-only 1 ≤ N ≤ 14
1 Linear 15 ≤ N ≤ 29
2 Quadratic 30 ≤ N ≤ 44
3 Cubic 45 ≤ N ≤ 59
4 Cubic Spline with One Internal Knot 60 ≤ N ≤ 74
5 Cubic Spline with Two Internal Knots 75 ≤ N ≤ 89
... ... ...
k Cubic Spline with (k - 3) Internal Knots 15*k ≤ N ≤ 15*(k + 1) - 1
... ... ...
16 Cubic Spline with 13 Internal Knots 240 ≤ N

Models with at least 60 unique trapping instances incorporate the possibility of a B-spline basis matrix via function bs. This means that piecewise polynomials are utilized to fit observed trends, with one piece covering a particular subset in the date range covered by trapping. The points covered by one polynomial piece correspond to quantiles in the temporal range.

Each piecewise polynomial is at most a cubic polynomial such that the end point of one piece connects with the start point of the next. Additionally, both first- and second-order derivatives are equal; thus, resulting splines, which may be composed of several individual polynomial pieces, appear smooth over their entire sample range with respect to their local slope (first derivative condition) and their local convexity (second derivative condition).

Parameter df, or the model degrees of freedom in bs, determines the number of internal knots utilized. The value of df corresponds to the values in the Table above. Function bs makes no consideration of model intercept; thus all glm-fit Poisson models utilize an additional overall intercept. This serves to vertically center models along the outcome axis.

Imputation

The trap-specific imputation procedure utilizes the final catch spline result obtained via the process described above. Specifically, it sweeps through all temporally sorted rows of the catch dataframe for the trap of interest, replacing all instances of "Not fishing" in variable TrapStatus with spline-estimated fish. All estimates loop over periods of "Not fishing" one at a time, predicting catch for a maximum of up to 24 hours. All "Not fishing" periods estimate on hours, in tandem with the temporal unit utilized in Poisson model offsets.

One extra line is inserted into catch.df for each unique 24-hour "Not fishing" period larger than global variable max.ok.gap. Currently, max.ok.gap is set at 2 hours in function GlobalVars. Thus, catch is not estimated for individual "Not fishing" episodes of duration less than two hours. In these cases, the most immediately preceding valid fishing period subsumes the sampleMinutes associated with these small time frames.

For example, for a 56-hour period of "Not fishing", predictions occur for each unique 24-hour period, with catch estimated proportionally for any "leftover" preceding and antecedent times. Assuming that a "Not fishing" period coincides with the start of a day, three resulting rows would be inserted into catch.df – two for the first two 24-hour periods, and a third for the leftover 8-hour period. The leftover 8-hour period would necessarily impute one-third the number of fish specified by that trap's catch model for that day. This number would then be added to the observed catch for that day, obtained over the remaining valid fishing of 16 hours. The sum of the imputed and observed catch comprises the total catch for that day.

The StartTime and EndTime variables for each of the new lines inserted into catch.df are defined so that no "Not fishing" periods remain. For these lines, variable gamEstimated is set to TRUE. Assignment of variable batchDate is based on EndTime, as usual. This methodology applies for all days between the time period requested via min.date and max.date in associated passage functions, for each unique trap trapPositionID in catch.df.

If catch.df contains no periods of "Not fishing", no imputation is performed.

Unassigned Decimal Catch

Starting with campR version 1.0.0, unassigned fish could be partitioned into decimal fractions during the plus-count routine. This leads to catch values may have decimal values, with the number of values after the decimal dictated by global variable unassd.sig.digit in GlobalVar.r. Usually, this value is set to 1. However, the use of decimal fish in Poisson-fitting algorithms prevents calculation of the AIC, since the functions utilized to calculate its likelihood assume integer outcome data. To get around this, the loglikelihood is reconstructed; to estimate the value of log(n!) inherent to the calculation, the method of Nemes (2007) is used.

Author(s)

WEST Inc.

References

Nemes, G. (2010) "New asymptotic expansion for the Gamma function", Archiv der Mathematik, 95 (2): 161-169.

See Also

F.efficiency.model

Examples

## Not run: 
#   ---- Fit splines and impute for missing data for each unique
#   ---- trapPositionID in data frame catch.df.
fitCatch <- F.catch.model(catch.df)

## End(Not run)

tmcd82070/CAMP_RST documentation built on April 6, 2022, 12:07 a.m.