clus.lf | R Documentation |
Statistical comparison of length frequencies is performed using the two-sample Kolmogorov & Smirnov test. Randomization procedures are used to derive the null probability distribution.
clus.lf(group = NULL, haul = NULL, len = NULL, number= NULL,
binsize = NULL, resamples = 100)
group |
vector containing the identifier used for group membership of length data. This variable is used to determine the number of groups and comparisons. Identifier can be numeric or character. |
haul |
vector containing the variable used to identify the sampling unit (e.g., haul) of length data. Identifier can be numeric or character. |
len |
vector containing the length class data. There should be one record for each length class by group and haul. |
number |
vector containing the numbers of fish in each length class. |
binsize |
size of the length class (e.g., 5-cm, 10, cm, etc.) used to construct the cumulative length frequency
from raw length data. The formula used to create bins is |
resamples |
number of randomizations. Default = 100. |
Length frequency distributions of fishes are commonly tested for differences among groups (e.g., regions, sexes, etc.) using a two-sample Kolmogov-Smirnov test (K-S). Like most statistical tests, the K-S test requires that observations are collected at random and are independent of each other to satisfy assumptions. These basic assumptions are violated when gears (e.g., trawls, haul seines, gillnets, etc.) are used to sample fish because individuals are collected in clusters . In this case, the "haul", not the individual fish, is the primary sampling unit and statistical comparisons must take this into account.
To test for difference between length frequency distributions from simple random cluster sampling, a randomization test that uses "hauls" as the primary sampling unit can be used to generate the null probability distribution. In a randomization test, an observed test statistic is compared to an empirical probability density distribution of a test statistic under the null hypothesis of no difference. The observed test statistic used here is the Kolmogorov-Smirnov statistic (Ds) under a two-tailed test:
Ds= max|S1(X)-S2(X)|
where S1(X) and S2(X) are the observed cumulative length frequency distributions of group 1 and group 2 in the paired comparisons.
S1(X) and S2(X) are calculated such that S(X)=K/n
where K is the number of scores equal to or less
than X and n is the total number of length observations (Seigel, 1956).
To generate the empirical probability density function (pdf), haul data are randomly assigned without replacement to the two groups with samples sizes equal to the original number of hauls in each group under comparison.
The K-S statistic is calculated from the cumulative length frequency distributions of the two groups
of randomized data. The randomization procedure is repeated resamples
times to
obtain the pdf of D. To estimate the significance of Ds, the proportion of all randomized D values
that were greater than or equal to Ds is calculated.
It is assumed all fish caught are measured. If subsampling occurs, the number at length (measured) must be expanded to the total caught.
Data vectors described in arguments
should be aggregated so that each record contains the number of fish in each length class by group and haul identifier. For example,
group | tow | length | number |
North | 1 | 10 | 2 |
North | 1 | 12 | 5 |
North | 2 | 11 | 3 |
North | 1 | 10 | 17 |
North | 2 | 14 | 21 |
. | . | . | . |
. | . | . | . |
South | 1 | 12 | 34 |
South | 1 | 14 | 3 |
results |
list element containing the Ds statistics from the observed data comparisons and significance probabilities. |
obs_prop |
list element containing the observed cumulative proportions for each group. |
Drandom |
list element containing the D statistics from randomization for each comparison. |
Gary A. Nelson, Massachusetts Division of Marine Fisheries gary.nelson@mass.gov
Manly, B. F. J. 1997. Randomization, Bootstrap and Monte Carlos Methods in Biology. Chapman and Hall, New York, NY, 399 pp.
Seigel, S. 1956. Nonparametric Statistics for Behavioral Sciences. McGraw-Hill, New York, NY. 312 p.
clus.str.lf
data(codcluslen)
clus.lf(group=codcluslen$region,haul=c(paste("ST-",codcluslen$tow,sep="")),
len=codcluslen$length, number=codcluslen$number,
binsize=5,resamples=100)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.