trawlAgg: Aggregate Trawl Data

Description Usage Arguments Details Value Note Examples

View source: R/trawlAgg.R

Description

Aggregate trawl data to specified levels of biological, spatial, and temporal dimensions

Usage

1
2
3
4
5
6
7
trawlAgg(X, FUN = NULL, bio_lvl = c("individual", "sex", "spp", "species",
  "genus"), space_lvl = c("haulid", "lon-lat", "lat", "lon", "stratum",
  "reg"), time_lvl = c("haulid", "datetime", "day", "month", "season",
  "year"), bioFun = FUN, envFun = FUN, bioCols = c("wtcpue", "cntcpue"),
  envCols = c("stemp", "btemp", "depth"), metaCols = NULL,
  meta.action = c("drop", "unique1", "collapse", "lu", "FUN"),
  metaFun = NULL, use_nAgg = TRUE, na.rm = TRUE)

Arguments

X

A data.table containing trawl data

FUN

Function used for aggregating each subset of X.

bio_lvl, space_lvl, time_lvl

Level of biological, spatial, and temporal specificity used in subsetting. Can be abbreviated. If an abbreviated match is not found, the supplied character is assumed to refer to a column in X. Each must be of length 1. See 'Details'.

bioFun, envFun

Functions to be applied to measured biological and environmental columns, respectively. Default is to use FUN.

bioCols, envCols

Character vector specifying names of columns to be considered as measured biological and environmental columns, respectively.

metaCols

If NULL (default), includes all columns of X not used by bio_lvl, space_lvl, time_lvl, bioCols, or envCols.

meta.action

Method for handling metaCols variables. "drop" results in the columns being dropped (the default), "unique1" returns the first unique value, "collapse" returns a character of unique values separated by a comma, "lu" returns the number of unique elements (via lu), and "FUN" indicates the use of function(s) specified by metaFun.

metaFun

If meta.action is "FUN", a function or list of functions to be applied to each column in metaCols during aggregation. The default, NULL, will return an error if meta.action is "FUN". If a single function is to be applied to all column, it will be recycled and does not need to be in a list.

use_nAgg

Add a column to output indicating the number of elements aggregated. The name of this column is "nAgg".

na.rm

Logical (default TRUE)

Details

In each of bio_lvl, space_lvl, and time_lvl, the default arguments are listed in order of decreasing specificity. Some of these levels are crossed, others nested. Default behavior and side-effects of the organization of these factors are described below, but in general they are not limiting to the user. The default behaviors are meant to be intuitive for common analyses.

The bio_lvl columns are "crossed" below the "spp" level, thus referring to "sex" also refers to "spp". Avoiding the behavior of implied references can be avoided by, for example, creating a new column called "sex2" and setting bio_lvl="sex2". Note in this special case (of "sex"), "spp" is not included in the default of metaCols, and thus when using bio_lvl="sex", "spp" will not be included in metaCols and will not be affected by meta.action. However, if bio_lvl="individual", internally bio_lvl is just NULL, referring to no columns, so it may be advisable to include columns like "sex" and "spp" in metaCols (which would be done anyway in the default of metaCols), and to retain those columns by using a non-default for meta.action. See Examples.

In time_lvl all levels above "datetime" are assumed to be crossed, thus referring to "season" will aggregate with a temporal grain of seasons within a year. Specifying time_lvl="season" does not imply a reference to "year" in the sense that "year" will still be included in metaCols by default, and thus affected by meta.action (whose default is "drop"). However, temporal factors are created by adding a new "time_lvl" column in the output data.table. When possible, this new column will be of class POSIXct; otherwise, a character. Note that "day" refers to "day of year".

The levels of space_lvl are nested, not crossed, and thus their handling is more intuitive than for biological or temporal levels. The only oddity here is the "lon-lat" indicator, which simply indicates that both "lon" and "lat" columns are to be used in aggregation.

The value "haulid" is special because this column refers to both space and time. In general, space and time are correlated within a region, because different places tend to be sampled at different times.

When meta.action="FUN", metaFun can be a named list to refer to each of metaCols (or it can just be used as metaFun=length, e.g., where the same function is applied to all columns). When metaFun is a list, functions are matched to columns by name, not by order of listing. For example, if metaCols=c("reg","trophicLevel", "lon"), it might be useful to take the unique values of the regions, the means of the trophic levels, and the first unique value of longitude after rounding to 1 decimal place. In that case, one could do metaFun=list(reg=unique, trophicLevel=mean, lon=function(x)unique(round(x,1))).

The argument na.rm affects any of the functions passed as arguments, even custom functions (so long as they accept "na.rm" as arguments). All functions must accept na.rm as an argument; if a function does not, re-rewrite so it does (e.g., function(x, ...)length(x)). na.rm also affects "unique1", "collapse", and "lu" in meta.action. In intances where the functions mean or sum are used and na.rm=TRUE, consider instead using meanna and sumna, respectively.

Value

Returns an aggregated data.table. See 'Details' for columns returned.

Note

The use of use_nAgg is complicated by the fact that when na.rm=TRUE, each column may very well have a different number of aggregated values. So right now, use_nAgg does not adhere to na.rm=TRUE, and includes NA values in its count.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
trim.neus <- trawlTrim("neus", c.add=c("length","sex"))
mini_data <- trim.neus[
	pick(spp, 2, w=TRUE)
	& pick(stratum, 5, w=TRUE)
	& pick(year,5, w=TRUE)
]

# aggregate species within a haul (among individuals)
# this means taking the sum of many bio metrics
# Note that I put 'sex' in metaCols, b/c I don't want
# the bioFun applied to it (preferring instead to
# take the first unique value as a form of aggregating)
neus1 <- trawlAgg(
	X=mini_data,
	bioFun=sumna,
	envFun=meanna,
	bio_lvl="spp", space_lvl="haulid", time_lvl="season",
	bioCols=c("wtcpue","cntcpue"),
	envCols=c("btemp"),
	metaCols=c("reg","common","datetime","stratum","sex"),
	meta.action=c("unique1")
)

# aggregate within a species within stratum
# refer to the time_lvl column from previous trawlAgg()
# can use mean for both bio and env
neus2 <- trawlAgg(
	X=neus1,
	FUN=meanna,
	bio_lvl="spp", space_lvl="stratum", time_lvl="time_lvl",
	bioCols=c("wtcpue","cntcpue"),
	envCols=c("btemp"),
	metaCols=c("reg","common","datetime"),
	meta.action=c("unique1")
)

# A more complex example
# Say we want the weight, count, and length
# Within a stratum, of a given sex of a given species, during a season
# To illustrate a complex situation, let's take the
# mean of the weight and length, and sum of count
# Because only 1 type of function can be applied to bio_Cols,
# we can just exercise the extreme flexibility of
# metaCols and metaFunwe to achieve goals.
# Also, notice how we transform the "datetime" column to year
trawlAgg(
	X=mini_data,
	FUN=meanna,
	bio_lvl="individual", space_lvl="stratum",time_lvl="season",
	bioCols=c("weight","length"),
	envCols=c("stemp","btemp", "depth"),
	metaCols=c("datetime","reg", "cnt", "spp", "common", "sex"),
	meta.action=c("FUN"),
	metaFun=list(
	# note that these are named, and don't need
	# to be in the same order as metaCols
		sex = function(x, ...)una(x, ...)[1],
		reg = function(x, ...)una(x, ...)[1], # this is unique1
		datetime = function(x, ...)una(data.table::year(x), ...)[1],
		common = function(x, ...)una(x, ...)[1],
		spp = function(x, ...)una(x, ...)[1],
		cnt = sumna
	)
) # not surprisingly, there wasn't any aggregation at the level of individuals

rBatt/trawlData documentation built on May 26, 2019, 7:45 p.m.