KnownDifferencesOldVersusNewStoX: Known differences between StoX 2.7 and the new StoX >= 3.0.0

KnownDifferencesOldVersusNewStoXR Documentation

Known differences between StoX 2.7 and the new StoX >= 3.0.0

Description

For different reasons the results from the old StoX 2.7 and the new StoX >= 3.0.0 cannot always be expected to be identical. The expected differences and the reasons for the differences are described in the following:

Details

  • Random operations: The move from JAVA based calculations in StoX 2.7 (except bootstrapping and imputation) to all calculations performed in R in StoX >= 3.0.0 implies that any operations involving random sampling cannot be exactly recreated in the new StoX >= 3.0.0.

    As of R 3.6, which was released after StoX 2.7, the sampling routine of R changed, discouraging any effort to exactly recreate R-code used by StoX 2.7 for resampling in StoX >= 3.0.0.

    The functions in StoX 2.7 that contained random sampling was the JAVVA coded FillMissingData() in the Baseline report model and the R and JAVA coded runBootstrap() and imputeByAge() in the R model.

    In StoX >= 3.0.0 random sampling takes place in the function ImputeSuperIndividuals() in the Baseline model and in Bootstrap() in the Analysis model.

  • Rounding errors: StoX >= 3.0.0 defines the following rule is applied to all output for StoX functions: Round off to the maximum of 12 digits and 12 significant digits. For example, the value 1.23456789012E-10 would be rounded off to 1.23e-10 using only the first rule (rounded off to 12 digits), but retains 12 significant digits with the second rule, resulting in 1.23456789012e-10. In StoX 2.7 rounding was not consistently applied.

  • RaisingFactorPriority: In StoX 2.7 the raising factor used when summing length distributions from several samples in the function StationLengthDist is first calculated as sample weight divided by catch weight. If any of sample weight and catch weight are missing, a new attempt is made using sample count divided by catch count. If both raising factor are missing the sample is discarded with a warning, leaving only the remaining samples of the haul that have positive raising factor. If all samples are discarded, the station is discarded as well in the caluclation of StationLengthDist. The exception to this proecdure is when LengthDistType = "PercentLengthDist", in which case the raising factor is set to 1 for all samples are, regardless of whether the raising factor is missing or not.

    In StoX >= 3.0.0 the parameter RaisingFactorPriority must be set to "Weight" to coincide with StoX 2.7, whereas "Number" is also an option. When a sample with missing raising factor is encountered, StoX 3.4.0 throws as error stating that the missing raising factor is considered as an error in the data and that the Haul or Sample could be filtered out. To recreate the behavior of StoX 2.7, apply the suggested filter on Sample

    When LengthDistributionType = "Percent", the raising factor is set to 1 if the Haul containing only one sample and that sample has missing raising factor.

  • Stratum allocation: In StoX 2.7 the method JTSUtils.within was used to position stations in each stratum, defining the stations to average over when calculating mean swept-area density. In StoX 3.0.0 - 3.6.2 the allocation of stations to strata is performed using the function over() in the (retired) package sp using the WGS84 projection. As from the following StoX 4.0.0 allocation is done using the function st_intersects() of the sf package. It has been observed that stations close to the border between two strata can be assigned to the wrong stratum in StoX 2.7. If differences in the output from SweptAreaDensity() are found between StoX 2.7 and StoX >= 3.0.0, it may be wise to check for any differences in the allocation of the stations done by the function DefineSweptAreaPSU() in StoX 2.7 and the Resolution table of the output from MeanLengthDistribution() in StoX >= 3.0.0.

  • Imputation across all strata: In StoX 2.7 the final level of imputation was all strata thar were included in the Baseline via the IncludeInTotal column of the input WKT file to the DefineStrata function. In StoX >= 3.0.0 the strata can be grouped to different surveys (or all in the same survey). The final step of the imputation searches for individuals to impute from only inside the survey, so to reproduce estimates from StoX 2.7 the exactly one survey must be created from the strata with IncludeInTotal = true. This can be done automatically by using the project.xml file of the StoX 2.7 project as input to the function DefineSurvey() in StoX >= 3.0.0.

  • NumberOfLengthSamples: The WeightingMethod == "NumberOfLengthSamples" in BioticAssignmentWeighting() has changed from counting all individuals in StoX 2.7, length measured or not, to counting only the individuals for which IndividualTotalLengthCentimeter is not NA.

  • LogKey In StoX 2.7 the EDSU IDs are given by a concatenation of cruise, log and start_time of the NMDEcchosounder data. In StoX >= 3.0.0 only the cruise and ISO 8601 formatted time is used. As a result, data where the times are not unique within a cruise (e.g. if the time resolution is minutes as in the PGNAPES database) the logs with LogKey identical to a previous LogKey are removed with a warning. The result may be that results from a StoX 2.7 projecct are not reproducible unless the input data are manipulated to obtain unique times for each cruise (e.g. using the function TranslateAcoustic()).

  • Stratum area StoX >= 3.0.0 can read shapefiles, StoX WKT files and GeoJSON files. To make this possible a consistent calculation of the centroid of each polygon is defined. This may result in marginally different stratum area calculated compared to StoX 2.7.


StoXProject/RstoxFramework documentation built on Oct. 17, 2023, 1:24 p.m.