Description Usage Arguments Details Value Note Examples
Aggregate trawl data to specified levels of biological, spatial, and temporal dimensions
1 2 3 4 5 6 7 | trawlAgg(X, FUN = NULL, bio_lvl = c("individual", "sex", "spp", "species",
"genus"), space_lvl = c("haulid", "lon-lat", "lat", "lon", "stratum",
"reg"), time_lvl = c("haulid", "datetime", "day", "month", "season",
"year"), bioFun = FUN, envFun = FUN, bioCols = c("wtcpue", "cntcpue"),
envCols = c("stemp", "btemp", "depth"), metaCols = NULL,
meta.action = c("drop", "unique1", "collapse", "lu", "FUN"),
metaFun = NULL, use_nAgg = TRUE, na.rm = TRUE)
|
X |
A data.table containing trawl data |
FUN |
Function used for aggregating each subset of |
bio_lvl, space_lvl, time_lvl |
Level of biological, spatial, and temporal specificity used in subsetting. Can be abbreviated. If an abbreviated match is not found, the supplied character is assumed to refer to a column in |
bioFun, envFun |
Functions to be applied to measured biological and environmental columns, respectively. Default is to use |
bioCols, envCols |
Character vector specifying names of columns to be considered as measured biological and environmental columns, respectively. |
metaCols |
If NULL (default), includes all columns of |
meta.action |
Method for handling |
metaFun |
If |
use_nAgg |
Add a column to output indicating the number of elements aggregated. The name of this column is "nAgg". |
na.rm |
Logical (default TRUE) |
In each of bio_lvl
, space_lvl
, and time_lvl
, the default arguments are listed in order of decreasing specificity. Some of these levels are crossed, others nested. Default behavior and side-effects of the organization of these factors are described below, but in general they are not limiting to the user. The default behaviors are meant to be intuitive for common analyses.
The bio_lvl
columns are "crossed" below the "spp" level, thus referring to "sex" also refers to "spp". Avoiding the behavior of implied references can be avoided by, for example, creating a new column called "sex2" and setting bio_lvl="sex2"
. Note in this special case (of "sex"), "spp" is not included in the default of metaCols
, and thus when using bio_lvl="sex"
, "spp" will not be included in metaCols
and will not be affected by meta.action
. However, if bio_lvl="individual"
, internally bio_lvl
is just NULL, referring to no columns, so it may be advisable to include columns like "sex" and "spp" in metaCols
(which would be done anyway in the default of metaCols), and to retain those columns by using a non-default for meta.action
. See Examples.
In time_lvl
all levels above "datetime" are assumed to be crossed, thus referring to "season" will aggregate with a temporal grain of seasons within a year. Specifying time_lvl="season"
does not imply a reference to "year" in the sense that "year" will still be included in metaCols
by default, and thus affected by meta.action
(whose default is "drop"). However, temporal factors are created by adding a new "time_lvl" column in the output data.table. When possible, this new column will be of class POSIXct; otherwise, a character. Note that "day" refers to "day of year".
The levels of space_lvl
are nested, not crossed, and thus their handling is more intuitive than for biological or temporal levels. The only oddity here is the "lon-lat" indicator, which simply indicates that both "lon" and "lat" columns are to be used in aggregation.
The value "haulid" is special because this column refers to both space and time. In general, space and time are correlated within a region, because different places tend to be sampled at different times.
When meta.action="FUN"
, metaFun
can be a named list to refer to each of metaCols
(or it can just be used as metaFun=length
, e.g., where the same function is applied to all columns). When metaFun
is a list, functions are matched to columns by name, not by order of listing. For example, if metaCols=c("reg","trophicLevel", "lon")
, it might be useful to take the unique values of the regions, the means of the trophic levels, and the first unique value of longitude after rounding to 1 decimal place. In that case, one could do metaFun=list(reg=unique, trophicLevel=mean, lon=function(x)unique(round(x,1)))
.
The argument na.rm
affects any of the functions passed as arguments, even custom functions (so long as they accept "na.rm" as arguments). All functions must accept na.rm as an argument; if a function does not, re-rewrite so it does (e.g., function(x, ...)length(x)). na.rm
also affects "unique1", "collapse", and "lu" in meta.action
. In intances where the functions mean
or sum
are used and na.rm=TRUE
, consider instead using meanna
and sumna
, respectively.
Returns an aggregated data.table. See 'Details' for columns returned.
The use of use_nAgg
is complicated by the fact that when na.rm=TRUE, each column may very well have a different number of aggregated values. So right now, use_nAgg
does not adhere to na.rm=TRUE, and includes NA values in its count.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | trim.neus <- trawlTrim("neus", c.add=c("length","sex"))
mini_data <- trim.neus[
pick(spp, 2, w=TRUE)
& pick(stratum, 5, w=TRUE)
& pick(year,5, w=TRUE)
]
# aggregate species within a haul (among individuals)
# this means taking the sum of many bio metrics
# Note that I put 'sex' in metaCols, b/c I don't want
# the bioFun applied to it (preferring instead to
# take the first unique value as a form of aggregating)
neus1 <- trawlAgg(
X=mini_data,
bioFun=sumna,
envFun=meanna,
bio_lvl="spp", space_lvl="haulid", time_lvl="season",
bioCols=c("wtcpue","cntcpue"),
envCols=c("btemp"),
metaCols=c("reg","common","datetime","stratum","sex"),
meta.action=c("unique1")
)
# aggregate within a species within stratum
# refer to the time_lvl column from previous trawlAgg()
# can use mean for both bio and env
neus2 <- trawlAgg(
X=neus1,
FUN=meanna,
bio_lvl="spp", space_lvl="stratum", time_lvl="time_lvl",
bioCols=c("wtcpue","cntcpue"),
envCols=c("btemp"),
metaCols=c("reg","common","datetime"),
meta.action=c("unique1")
)
# A more complex example
# Say we want the weight, count, and length
# Within a stratum, of a given sex of a given species, during a season
# To illustrate a complex situation, let's take the
# mean of the weight and length, and sum of count
# Because only 1 type of function can be applied to bio_Cols,
# we can just exercise the extreme flexibility of
# metaCols and metaFunwe to achieve goals.
# Also, notice how we transform the "datetime" column to year
trawlAgg(
X=mini_data,
FUN=meanna,
bio_lvl="individual", space_lvl="stratum",time_lvl="season",
bioCols=c("weight","length"),
envCols=c("stemp","btemp", "depth"),
metaCols=c("datetime","reg", "cnt", "spp", "common", "sex"),
meta.action=c("FUN"),
metaFun=list(
# note that these are named, and don't need
# to be in the same order as metaCols
sex = function(x, ...)una(x, ...)[1],
reg = function(x, ...)una(x, ...)[1], # this is unique1
datetime = function(x, ...)una(data.table::year(x), ...)[1],
common = function(x, ...)una(x, ...)[1],
spp = function(x, ...)una(x, ...)[1],
cnt = sumna
)
) # not surprisingly, there wasn't any aggregation at the level of individuals
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.