getExpansion_2: Second-stage expansion of composition samples up to catch...

Second-stage expansion of composition samples up to catch level


The second-stage expansion calculates the expansion factor based on the ratio of total catch within a stratification (e.g., year, gear group) to the amount of that catch that was sampled.


  Units = c("MT", "LB"),
  Convert = NULL,
  maxExp = 0.95,
  verbose = TRUE,



A data frame of biological samples originating from the Pacific Fishieries Information Network (PacFIN) data warehouse, which originated in 2014. Data are pulled using sql calls, see PullBDS.PacFIN().


A data frame of catch data, in pounds or in metric tonnes.


The units of the Catch data frame, see measurements::conv_unit_options[["mass"]] for options. Typical units are metric tonnes (e.g., "metric_ton") because that is the unit used in Stock Synthesis, but expansions are done in pounds because fish weights are in pounds. Thus, catches also need to be in pounds and will be converted as such.


A deprecated argument that is now set to NULL. Previously, it was a logical that defined if the Catch should be converted from metric tonnes to pounds, where TRUE is now the same as setting Units = "MT" and FALSE, which was the default, would be Units = "LB". Normally, one would have their catch in metric tonnes, i.e., Convert = TRUE or Units = "MT", such that it can be used within Stock Synthesis. todo: remove this input argument


The maximum expansion factor (either a number or a quantile) for building expansions. Typically, the default is 0.95. Set maxExp = Inf to see largest values.


A vector of column names in Pdata that you want to use as strata. These will match the way in which the catches are transformed from long to wide prior to inputting them into this function. If you leave this argument empty, then Pdata must already have a column named stratification. The function will look in the column names of the Catch data to determine the appropriate separator to use between columns when pasting the words together, which is done using apply and paste. Historically, it was mandatory to make this column yourself, but in 2021, this input argument was added to reduce the number of extraneous calls that were needed between functions. You can use as many levels of stratification as you want except year because it is already included in the call to stats::aggregate.


A logical specifying if output should be written to the screen or not. Good for testing and exploring your data but can be turned off when output indicates information that you already know. The printing of output to the screen does not affect any of the returned objects. The default is to always print to the screen, i.e., verbose = TRUE.


A file path to the directory where the results will be saved. The default is the current working directory. The path can be relative or absolute.


Find the catch for each year and grouping in Catch and divide by the pounds of fish that were collected for sampling for that same year and grouping. Sampled biomass is stored in All_Trips_Sampled_Lbs, which is the sum of Trip_Sampled_Lbs across sample numbers. Catches were already stratified (i.e., summed by group placed in a column for a given year or row). Catches are converted to pounds prior to dividing. Thus, per-stratum Expansion_Factor_2 is the catch / sampled catch.


The input PacFIN dataset, with column Expansion_Factor_2 appended.


  • Age data are expanded separately from lengths.

  • WA fish are generally only expanded using Expansion_Factor_2.

  • Other expansions are the product of Expansion_Factor_1 * Expansion_Factor_2

  • For age-at-length comps, set Final_Expansion_Factor to 1 because each fish represents only itself.


Andi Stephens

getExpansion_2 is ran after getExpansion_1 using the returned data frame.

