trawlCast: Cast Trawl

Description Usage Arguments Details Value Examples

View source: R/trawlCast.R

Description

Cast a data.table of trawl data to an array

Usage

1
2
3
trawlCast(x, formula = stratum ~ K ~ spp ~ year, valueName = "wtcpue",
  valFill = NA, fixAbsent = TRUE, allNA_noSamp = "spp", valAbsent = 0,
  grandNamesOut = c("j", "k", "i", "t"), ...)

Arguments

x

A data.table with column names to be used in formula, valueName, and potentially allNA_noSamp

formula

Formula describing array dimensions, in order as would be given by dim. Passed to formula in acast.

valueName

Column name whose elements will fill the array. Passed to value.var in acast.

valFill

Value to use for filling in missing combinations; defaults to NA. Passed to fill in acast.

fixAbsent

Logical (default TRUE) to indicate the need to fill one value for no sampling (valFill), and another for a true absence (valAbsent). See 'Details'.

allNA_noSamp

A character indicator the column/ dimension, which, if all its elements are NA's, indicates a no-sampling event, as opposed to an absence. When all(is.na(allNA_noSamp)) is FALSE for a combination of the other dimensions in formula, valAbsent will be used instead of valFill.

valAbsent

value to be used in lieu of valFill to indicate an absence as opposed to not-sampled.

grandNamesOut

Grand dimension names for output array (e.g., names(dimnames(x)))

...

Other arguments to be passed to acast.

Details

Many columns in bottom trawl data can be described as summarizing 3 aspects of metadata: when, where, and what. This same logic is expressed in the function trawlAgg, which prompts users to conceptualize aggregating trawl data as aggregating at different specificities for time, space, and biological dimensions. In this function's default for formula, the "where" is described by "stratum" (a sampling site), "when" by "year", and "what" by "spp" (species). The "K" value is a replicate, which could mean either "when" or "what" (and is similar to "haulid" in trawlAgg, which describes it as being indicative of both time and space). Given those identifying dimensions, we can then appropriately contextualize a measured value, e.g. "weight". Not all cases need these same dimensions to be in formula (e.g., if the measured value is bottom temperature ("btemp") the "what" dimension is not needed), which is why this function doesn't impose as much structure on what categories of columns should comprise formula.

However, it can be useful to think of that structure for formula when trying to understand the distinction and between elements to be filled with valFill vs. valAbsent.

For species data, there is an important distinction between a species not being present, and no sampling occurring. For example, entries for species data often do not include 0's, but 0's are implied for Species X when a site is sampled and no value is reported for Species X, even though a value is reported for other species in this instance and Species X is reported in other sampling events. In this case, the observation is 0, not NA.

In the context just described, valFill would be NA (the default); if we wanted to change Species X (-esque) values from NA to 0 (under appropriate conditions), set fixAbsent to TRUE (default) and valAbsent to 0 (default). More generally, the allNA_noSamp argument defines the array dimension(s) that, if all elements are NA while varying allNA_noSamp and holding other dimensions constant, that the NA values are appropriate and that those NA's should not be switched to valAbsentwhen fixAbsent=TRUE. For the species example given above, the default allNA_noSamp="spp" would be appropriate. In general, it may be fair to say that allNA_noSamp should be set to the "what" dimension(s) (as described above), and that valAbsent should be set to the value taken on by valueName when a measurement is attempted for a particular factor level of valueName that is absent.

As implied the previous Details, casting data expands the number of explicit valueName elements in the data set. This function casts to an array because casting to a data.frame or data.table will take up far more RAM. The the difference in RAM increases with the number of identifying variables and how many unique levels they have (but also depends on whether those identifying variables are encoded as characters, factors, integers, doubles, etc).

Value

An array with dimensions equal to the number of unique values in each column in formula.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
mini.t <- trawlTrim(copy(clean.ebs), c.add="Picture")[Picture=="y"][pick(spp, 9)]
mini.a <- trawlAgg(
	mini.t,
	FUN=meanna,
	bio_lvl="spp",
	space_lvl="stratum",
	time_lvl="year",
	metaCols="common",
	meta.action="unique1"
)

mini.c <- trawlCast(mini.a, time_lvl~stratum~spp, grandNamesOut=c("t","j","i"))
(smry <- t(apply(mini.c, 3,
	function(x){
		c(
			"NA"=sum(is.na(x)),
			"0"=sum(!is.na(x)&x==0),
			">0"=sum(!is.na(x)&x>0)
		)
	}
)))

## Not run: 
par(mfrow=c(3,3), mar=c(0.15,0.15,0.15,0), ps=8)
for(i in 1:nrow(smry)){
	tspp <- rownames(smry)[i]
	sppImg(tspp,
		mini.a[spp==tspp,unique(common)],
		side=3, line=-2, xpd=T

	)
}

## End(Not run)

rBatt/trawlData documentation built on July 19, 2018, 1:26 a.m.