expandCounts | R Documentation |
Repeat individual fish data, including lengths, from tallied counts and, optionally, add a random digit to length measurements to simulate actual length of fish in the bin. This is useful as a precursor to summaries that require information, e.g., lengths, of individual fish (e.g., length frequency histograms, means lengths).
expandCounts(
data,
cform,
lform = NULL,
removeCount = TRUE,
lprec = 0.1,
new.name = "newlen",
cwid = 0,
verbose = TRUE,
...
)
data |
A data.frame that contains variables in |
cform |
A formula of the form |
lform |
An optional formula of the form |
removeCount |
A single logical that indicates if the variable that contains the counts of individuals (as given in |
lprec |
A single numeric that controls the precision to which the random lengths are recorded. See details. |
new.name |
A single string that contains a name for the new length variable if random lengths are to be created. |
cwid |
A single positive numeric that will be added to the lower length bin value in instances where the count exceeds one but only a lower (and not an upper) length were recorded. See details. |
verbose |
A logical indicating whether progress message should be printed or not. |
... |
Not yet implemented. |
Fisheries data may be recorded as tallied counts in the field. For example, field biologists may have simply recorded that there were 10 fish in one group, 15 in another, etc. More specifically, the biologist may have recorded that there were 10 male Bluegill from the first sampling event between 100 and 124 mm, 15 male Bluegill from the first sampling event between 125 and 149 mm, and so on. At times, it may be necessary to expand these counts such that the repeated information appears in individual rows in a new data.frame. In this specific example, the tallied counts would be repeated such that the male, Bluegill, first sampling event, 100-124 mm information would be repeated 10 times; the male, Bluegill, first sampling event, 125-149 mm information would be repeated 15 times, and so on. This function facilitates this type of expansion.
Length data has often been collected in a “binned-and-tallied” format (e.g., 10 fish in the 100-124 mm group, 15 in the 125-149 mm group, etc.). This type of data collection does not facilitate easy or precise calculations of summary statistics of length (i.e., mean and standard deviations of length). Expanding the data as described above does not solve this problem because the length data are still essentially categorical (i.e., which group the fish belongs to rather than what it's actual length is). To facilitate computation of summary statistics, the data can be expanded as described above and then a length can be randomly selected from within the recorded length bin to serve as a “measured” length for that fish. This function performs this type of expansion by randomly selecting the length from a uniform distribution within the length bin (e.g., each value between 100 and 124 mm has the same probability of being selected).
This function makes some assumptions for some coding situations. First, it assumes that all lowerbin
values are actually lower than all upperbin
values. The function will throw an error if this is not true. Second, it assumes that if a lowerbin
but no upperbin
value is given then the lowerbin
value is the exact measurement for those fish. Third, it assumes that if an upperbin
but no lowerbin
value is given that this is a data entry error and that the upperbin
value should be the lowerbin
value. Fourth, it assumes that it is a data entry error if varcount
is zero or NA
and lowerbin
or upperbin
contains values (i.e., why would there be lengths if no fish were captured?).
A data.frame of the same structure as data
except that the variable in cform
may be deleted and the variable in new.name
may be added. The returned data.frame will have more rows than data
because of the potential addition of new individuals expanded from the counts in cform
.
Derek H. Ogle, DerekOgle51@gmail.com
See expandLenFreq
for expanding length frequencies where individual fish measurements were made on individual fish in a subsample and the remaining fish were simply counted.
# all need expansion
( d1 <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max"),
lwr.bin=c(15,15.5,16,16,17,17),
upr.bin=c(15.5,16,16.5,16.5,17.5,17.5),
freq=c(6,4,2,3,1,1)) )
expandCounts(d1,~freq)
expandCounts(d1,~freq,~lwr.bin+upr.bin)
# some need expansion
( d2 <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max"),
lwr.bin=c(15,15.5,16,16,17.1,17.3),
upr.bin=c(15.5,16,16.5,16.5,17.1,17.3),
freq=c(6,4,2,3,1,1)) )
expandCounts(d2,~freq)
expandCounts(d2,~freq,~lwr.bin+upr.bin)
# none need expansion
( d3 <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max"),
lwr.bin=c(15,15.5,16,16,17.1,17.3),
upr.bin=c(15,15.5,16,16,17.1,17.3),
freq=c(6,4,2,3,1,1)) )
expandCounts(d3,~freq)
expandCounts(d3,~freq,~lwr.bin+upr.bin)
# some need expansion, but different bin widths
( d4 <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max"),
lwr.bin=c(15, 15, 16, 16, 17.1,17.3),
upr.bin=c(15.5,15.9,16.5,16.9,17.1,17.3),
freq=c(6,4,2,3,1,1)) )
expandCounts(d4,~freq)
expandCounts(d4,~freq,~lwr.bin+upr.bin)
# some need expansion but include zeros and NAs for counts
( d2a <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max","Max","Max","Max"),
lwr.bin=c(15, 15.5,16 ,16 ,17.1,17.3,NA,NA,NA),
upr.bin=c(15.5,16 ,16.5,16.5,17.1,17.3,NA,NA,NA),
freq=c(6,4,2,3,1,1,NA,0,NA)) )
expandCounts(d2a,~freq,~lwr.bin+upr.bin)
# some need expansion but include NAs for upper values
( d2b <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max"),
lwr.bin=c(15, 15.5,16 ,16 ,17.1,17.3),
upr.bin=c(NA ,NA ,16.5,16.5,17.1,17.3),
freq=c(6,4,2,3,1,1)) )
expandCounts(d2b,~freq,~lwr.bin+upr.bin)
# some need expansion but include NAs for upper values
( d2c <- data.frame(name=c("Johnson","Johnson","Jones","Frank","Frank","Max"),
lwr.bin=c(NA,NA, 16 ,16 ,17.1,17.3),
upr.bin=c(15,15.5,16.5,16.5,17.1,17.3),
freq=c(6,4,2,3,1,1)) )
expandCounts(d2c,~freq,~lwr.bin+upr.bin)
## Not run:
##!!##!!## Change path to where example file is and then run to demo
## Read in datafile (note periods in names)
df <- read.csv("c:/aaawork/consulting/R_WiDNR/Statewide/Surveysummaries2010.csv")
str(df)
## narrow variables for simplicity
df1 <- df[,c("County","Waterbody.Name","Survey.Year","Gear","Species",
"Number.of.Fish","Length.or.Lower.Length.IN","Length.Upper.IN",
"Weight.Pounds","Gender")]
## Sum the count to see how many fish there should be after expansion
sum(df1$Number.of.Fish)
## Simple expansion
df2 <- expandCounts(df1,~Number.of.Fish)
## Same expansion but include random component to lengths (thus new variable)
## also note default lprec=0.1
df3 <- expandCounts(df1,~Number.of.Fish,~Length.or.Lower.Length.IN+Length.Upper.IN)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.