Statistical Comparison of Length Frequencies from Stratified Random Cluster Sampling
Description
Statistical comparison of length frequencies is performed using the twosample Kolmogorov & Smirnov test. Randomization procedures are used to derive the null probability distribution.
Usage
1 2 3 
Arguments
group 
vector containing the identifier used for group membership of length data. This variable is used to determine the number of groups and comparisons. Identifier can be numeric or character. 
strata 
vector containing the numeric identifier used for strata membership of length data. There must be a unique identifier for each stratum regardless of group membership. 
weights 
vector containing the strata weights (e.g., area, size, etc.) used to calculate the stratified mean length for a group. 
haul 
vector containing the variable used to identify the sampling unit (e.g., haul) of length data. Identifier can be numeric or character. 
len 
vector containing the length class. Each length class record must have associated group, strata, weights, and haul identifiers. 
number 
vector containing the number of fish in each length class. 
binsize 
size of the length class (e.g., 5cm, 10, cm, etc.) used to construct the cumulative length frequency
from raw length data. The formula used to create bins is
trunc(len/binsize)*binsize+binsize/2. If use of the raw length classes is desired, then 
resamples 
number of randomizations. Default = 100. 
Details
Length frequency distributions of fishes are commonly tested for differences among groups (e.g., regions, sexes, etc.) using a twosample KolmogovSmirnov test (KS). Like most statistical tests, the KS test requires that observations are collected at random and are independent of each other to satisfy assumptions. These basic assumptions are violated when gears (e.g., trawls, haul seines, gillnets, etc.) are used to sample fish because individuals are collected in clusters . In this case, the "haul", not the individual fish, is the primary sampling unit and statistical comparisons must take this into account.
To test for difference between length frequency distributions from stratified random cluster sampling, a randomization test that uses "hauls" as the primary sampling unit can be used to generate the null probability distribution. In a randomization test, an observed test statistic is compared to an empirical probability density distribution of a test statistic under the null hypothesis of no difference. The observed test statistic used here is the KolmogorovSmirnov statistic (Ds) under a twotailed test:
Ds= maxS1(X)S2(X)
where S1(X) and S2(X) are the observed cumulative proportions at length for group 1 and group 2 in the paired comparisons.
Proportion of fish of length class j in strataset (group variable) used to derive Ds
is calculated as
p(j)=sum(Ak Xjk)/sum(Ak Xk)
where A_k is the weight of stratum k, \bar{X}_{jk} is the mean number per haul of length class j
in stratum k
, and
\bar{X}_k is the mean number per haul in stratum k
. The numerator and denominator are summed over all k
. Before calculation of
cumulative proportions, the length class distributions for each group are corrected for missing lengths and are
constructed so that the range and intervals of each distribution match.
It is assumed all fish caught are measured. If subsampling occurs, the numbers at length (measured) must be expanded to the total caught.
To generate the empirical probability density function (pdf), length data of hauls from all strata are pooled and then hauls are randomly assigned without replacement
to each stratum with haul sizes equal to the original number of stratum hauls. Cumulative proportions are
then calculated as described above. The KS statistic is calculated from the cumulative length frequency distributions of the two groups
of randomized data. The randomization procedure is repeated resamples
times to
obtain the pdf of D. To estimate the significance of Ds, the proportion of all randomized D values
that were greater than or equal to Ds is calculated (Manly, 1997).
Data vectors described in arguments
should be aggregated so that each record contains the number of fish in each length class by group, strata, weights, and haul identifier. For example,
group  stratum  weights  tow  length  number 
North  10  88  1  10  2 
North  10  88  1  12  5 
North  10  88  2  11  3 
North  11  103  1  10  17 
North  11  103  2  14  21 
.  .  .  .  .  . 
.  .  .  .  .  . 
South  31  43  1  12  34 
South  31  43  1  14  3 
To correctly calculate the stratified mean number per haul, zero tows must be included in the dataset. To designate records for zero tows, fill the length class and number at length with zeros. The first line in the following table shows the appropriate coding for zero tows:
group  stratum  weights  tow  length  number 
North  10  88  1  0  0 
North  10  88  2  11  3 
North  11  103  1  10  17 
North  11  103  2  14  21 
.  .  .  .  .  . 
.  .  .  .  .  . 
South  31  43  1  12  34 
South  31  43  1  14  3 
Value
results 
list element containing the Ds statistics from the observed data comparisons and significance probabilities. 
obs_prop 
list element containing the cumulative proportions from each group. 
Drandom 
list element containing the D statistics from randomization for each comparison. 
Author(s)
Gary A. Nelson, Massachusetts Division of Marine Fisheries gary.nelson@state.ma.us
References
Manly, B. F. J. 1997. Randomization, Bootstrap and Monte Carlos Methods in Biology. Chapman and Hall, New York, NY, 399 pp.
Seigel, S. 1956. Nonparametric Statistics for Behavioral Sciences. McGrawHill, New York, NY. 312 p.
See Also
clus.lf
Examples
1 2 3 4 5 6  data(codstrcluslen)
clus.str.lf(
group=codstrcluslen$region,strata=codstrcluslen$stratum,
weights=codstrcluslen$weights,haul=codstrcluslen$tow,
len=codstrcluslen$length,number=codstrcluslen$number,
binsize=5,resamples=100)
