trawlData: trawlData

Description Regions Raw Data Clean Data Gridded Environmental Data Taxonomy Data Manipulation Helper Functions Plotting

Description

The trawlData package emphasizes an organized and reliable presentation of bottom trawl survey data. The package includes the functions used to read and clean these data, as well as the read-in and cleaned versions of the raw data set. Also included in the package are gridded data sets of environmental variables, and several functions designed to enhance manipulation and exploration of these data.

Regions

Most of the data sets follow a convention of prefix.___, where prefix is either "raw" or "clean", and ___ is the abbreviation of a region name. The region names are:

ai Aleutian Islands
ebs Eastern Bering Sea
gmex Gulf of Mexico
goa Gulf of Alaska
neus Northeast US
newf Newfoundland
sa US South Atlantic
sgulf Southern Gulf of St. Lawrence
shelf Scotian Shelf
wcann West Coast Annual
wctri West Coast Triennial

Raw Data

The truly raw data sets (from the data provider) can be found with a call to system.file(package="trawlData"). Explore the zip files in inst (except for neus, which is not zipped). For the raw data that have been already read into R, just check out raw.___. To read these data into R yourself, try read.trawl.

Clean Data

Cleaned data sets take the form of clean.___. These files are the equivalently to sequentially processing a raw data set with clean.names, clean.format, clean.columns, clean.tax, and clean.trimRow. Unlike the other cleaning functions, there is 1 additional cleaning function that actually results in loss of information relative to the original data set: clean.trimCol.

Gridded Environmental Data

There are surface temperature data sets in HadISST, and bottom temperatures in soda. Gridded depth data can be found in depth (from ETOPO). Coming soon will be a rugosity data set (stay tuned!)

Taxonomy

There are actually several taxonomy-related data sets, but most of them are aimed at producing our diamond in the rough: spp.key. There is also a .csv version of this data set that can be found in 'inst' (again, use system.file). If you find any corrections that need to be made here, please, let us know by [opening an Issue on GitHub](https://github.com/rBatt/trawlData)! Or email Ryan Batt. Better yet, create a pull request with the appropriate edits in the .csv ^_^

Data Manipulation

There are only a few of these. For aggregating data, check out trawlAgg. For "casting" data (e.g., reshape2::acast), see trawlCast (note: this is the function you want for adding 0's for unoserved species into the data set). To combine the tasks accomplished by clean.trimCol and clean.trimRow simulataneously for multiple regions (and combining the result of those regions into 1 data.table), see trawlTrim.

Helper Functions

There are many, but I'll list a few that I use a lot. First up are pick and mpick: together, these functions can do amazing things for helping you get good test data sets! At times mpick can be slow as it uses a brute-force approach when it has to; if you have a better approach, post it as an answer to my long and unpopular [question on SO](http://stackoverflow.com/q/33714985/2343633)!

Another handy function is match.tbl. When exact=TRUE it behaves similarly to match; otherwise, it tries a bunch of approximations. Due to the input-output format, it has often able to clean up sections of my code, making it much more reliable and easy to read. I often use it when trying to pull in information from multiple sources before merging it into a master data.table (is used heavily in creating and maintaining spp.key).

On the simpler end of things, I use lu very often; although this (and una) are getting replaced by functionality in data.table data.table::uniqueN (that's a newer function in 1.9.6 or 1.9.7, which aren't compatible with this package at the moment). For gridding data, I use ll2strat and roundGrid.

Finally, if you use data.table a lot (which you will be with this package!), you'll notice that you often have a character vector that you need to evaluate in the middle of a complex j= expression. Yeah, you can use with=FALSE, but then you lose a lot of functionality elsewhere. Enter s2c, which I often use in the form of dt[,eval(s2c(character_vector))]. You'll get a list out of it, so be mindful of that! Anyway, very handy.

Plotting

To check the frequency that each stratum was sampled, use check_strat. Forthcoming will be plot_check, which will be useful for checking raw data. A fun one is sppImg!


rBatt/trawlData documentation built on May 26, 2019, 7:45 p.m.