Description Regions Raw Data Clean Data Gridded Environmental Data Taxonomy Data Manipulation Helper Functions Plotting
The trawlData package emphasizes an organized and reliable presentation of bottom trawl survey data. The package includes the functions used to read and clean these data, as well as the read-in and cleaned versions of the raw data set. Also included in the package are gridded data sets of environmental variables, and several functions designed to enhance manipulation and exploration of these data.
Most of the data sets follow a convention of prefix.___
, where prefix
is either "raw" or "clean", and ___
is the abbreviation of a region name. The region names are:
ai | Aleutian Islands |
ebs | Eastern Bering Sea |
gmex | Gulf of Mexico |
goa | Gulf of Alaska |
neus | Northeast US |
newf | Newfoundland |
sa | US South Atlantic |
sgulf | Southern Gulf of St. Lawrence |
shelf | Scotian Shelf |
wcann | West Coast Annual |
wctri | West Coast Triennial |
The truly raw data sets (from the data provider) can be found with a call to system.file(package="trawlData")
. Explore the zip files in inst (except for neus, which is not zipped). For the raw data that have been already read into R, just check out raw.___
. To read these data into R yourself, try read.trawl
.
Cleaned data sets take the form of clean.___
. These files are the equivalently to sequentially processing a raw data set with clean.names
, clean.format
, clean.columns
, clean.tax
, and clean.trimRow
. Unlike the other cleaning functions, there is 1 additional cleaning function that actually results in loss of information relative to the original data set: clean.trimCol
.
There are surface temperature data sets in HadISST
, and bottom temperatures in soda
. Gridded depth data can be found in depth
(from ETOPO). Coming soon will be a rugosity data set (stay tuned!)
There are actually several taxonomy-related data sets, but most of them are aimed at producing our diamond in the rough: spp.key
. There is also a .csv version of this data set that can be found in 'inst' (again, use system.file
). If you find any corrections that need to be made here, please, let us know by [opening an Issue on GitHub](https://github.com/rBatt/trawlData)! Or email Ryan Batt. Better yet, create a pull request with the appropriate edits in the .csv ^_^
There are only a few of these. For aggregating data, check out trawlAgg
. For "casting" data (e.g., reshape2::acast
), see trawlCast
(note: this is the function you want for adding 0's for unoserved species into the data set). To combine the tasks accomplished by clean.trimCol
and clean.trimRow
simulataneously for multiple regions (and combining the result of those regions into 1 data.table), see trawlTrim
.
There are many, but I'll list a few that I use a lot. First up are pick
and mpick
: together, these functions can do amazing things for helping you get good test data sets! At times mpick
can be slow as it uses a brute-force approach when it has to; if you have a better approach, post it as an answer to my long and unpopular [question on SO](http://stackoverflow.com/q/33714985/2343633)!
Another handy function is match.tbl
. When exact=TRUE
it behaves similarly to match
; otherwise, it tries a bunch of approximations. Due to the input-output format, it has often able to clean up sections of my code, making it much more reliable and easy to read. I often use it when trying to pull in information from multiple sources before merging it into a master data.table (is used heavily in creating and maintaining spp.key
).
On the simpler end of things, I use lu
very often; although this (and una
) are getting replaced by functionality in data.table data.table::uniqueN
(that's a newer function in 1.9.6 or 1.9.7, which aren't compatible with this package at the moment). For gridding data, I use ll2strat
and roundGrid
.
Finally, if you use data.table a lot (which you will be with this package!), you'll notice that you often have a character vector that you need to evaluate in the middle of a complex j=
expression. Yeah, you can use with=FALSE
, but then you lose a lot of functionality elsewhere. Enter s2c
, which I often use in the form of dt[,eval(s2c(character_vector))]
. You'll get a list out of it, so be mindful of that! Anyway, very handy.
To check the frequency that each stratum was sampled, use check_strat
. Forthcoming will be plot_check
, which will be useful for checking raw data. A fun one is sppImg
!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.