knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(seaprocess)
This vignette is an overview of the seaprocess package and what it does to process data collected aboard the Sea Education Association's Sailing School Vessels the Corwith Cramer and the Robert C. Seamans.
The goals of this package are to:
The processing functions in this package are built such that in theory you should be able run a very general command that should take the raw data in any stream and take it all the way through to final output. However, there are always odd data examples in any cruise so built in to the processing routines are ways to initialize these higher level functions to change processing routines at lower levels
For example:
To process elg event files we use the function process_elg()
. The only thing you have to specify is the location of the folder that contains the elg event files. ie.
process_elg("data/elg")
Under the hood, process_elg()
is calling:
read_elg_fold()
, which is the actual function that reads a list of elg files from a folder. It loops through all the elg files in a folder and in-tern calls read_elg()
which is the function that reads individual elg files. read_elg_fold()
then combines them all in to one data file
average_elg()
, which is a function that averages the data into some time window. This is a little akin to what hourlywork used to do. However, you can specify the averaging window to be any number of minutes.
add_file_cruiseID()
, which adds an optional cruise identifier to output files so that you can keep them organized.
safely_write_csv()
, which writes the averaged data to a comma-separated-value file
format_elg_odv()
, which formats the averaged data so that it can be read into Ocean Data View.
The complete list of settings you can send to process_elg() are:
process_elg( elg_folder, cruiseID = NULL, csv_folder = "output/csv", csv_filename = "elg.csv", odv_folder = "output/odv/elg", odv_filename = "elg.txt", add_cruiseID = TRUE, average_window = 60, ... )
So, elg_folder
is the only thing you need to specify, but you can change any of the other arguments to customize the performance. See the help file for `process_elg()1 for details of what each do.
SEA records it's location and flowthrough data in a one-minute resolution text files with .elg file extensions. These are known as event files and they are created through a piece of software called SCS. These data files combine locational data with data from the flowthrough system and other data devices such as (in some cases) the CHIRP and weather system.
These data files are relatively simply read-in as they are in a comma-separated-value (csv) file format with one header line of column titles and corresponding data lines with columns separated by commas. The column headings are a little different between ships and between eras on board so the reading script is flexible to known variations in headings.
To read in an elg file you can use:
elg_data <- read_elg(system.file("extdata", "example_data","Event60sec_012.elg", package="seaprocess"))
Many data on SEA cruises are collected at discrete station locations. We stop (or partially stop) the ship at a location to, for example, take CTD measurements, water samples or tow nets. We will refer to the data collected during these activities as station data for the purposes of this document.
Station data is a combination of data recorded manually, data recorded by some device (i.e. CTD) and metadata of the time, location, and other environmental parameters. All of these data are recorded on a paper data sheet, but the goal of this portion of the computer data processing is to limit the amount of metadata that we need to input into the computer that is already being recorded digitally elsewhere.
SKETCH PROCESS
The starting point for all of this is to create an accurate station summary sheet. This will include information for each station including:
Once we've created this table, we can then use the metadata to populate specific data sheets for the individual deployments using only the bare minimum of hand-entered data. This might include sensor IDs for a CTD cast or the 100 count values for a nueston tow.
We begin creating a station summary sheet by using an Excel spreadsheet with the following column headers:
ADD SCREEN GRAB
NOTE: There is a new line on the sheet for every deployment type at every station so as to make use of the actual time of deployment in finding the location (see below). An example of this can be seen here.
knitr::kable(readxl::read_excel(system.file("extdata", "S285_station.xlsx", package="seaprocess"), col_types = "text"))
The next step is to match up the time of deployment with the location and environmental data recorded by SCS to populate the rest of the summary sheet with electronically recorded metadata.
This is acheived using the create_summary()
function. Create summary takes in the following inputs:
read_elg()
With these inputs, the function reads in the elg information, looks at all the dates/times of the deployments, and find the matching location and environmental metadata to add to the station summary. The result is a table that looks the same as the input station Excel sheet, but with aditional columns of:
An example of these code in action and the output can be seen here.
summary_input <- system.file("extdata", "S285_station.xlsx", package="seaprocess") elg_input <- system.file("extdata", "S285_elg", package="seaprocess") summary <- create_summary(summary_input, elg_input)
knitr::kable(summary)
ADD IN CODE AND DESCRIPTION OF FAIL SAFES IF THE ELG DATA IS NOT AVAILABLE
To move on to populating specific datasheets with this station summary data, we need to save this station summary to a CSV file which we can do by including a file path for the csv_output
argument. In this case, you wouldn't need to then assign the output of the function to some object (in our case summary
) unless you needed this object for something.
create_summary(summary_input, elg_input, "<some-path-to-folder>/<C/SXXX>_station_summary.csv")
We can now employ a mostly universal concept of taking any deployment (even a non-standard, cruise specific one), creating a Excel sheet with station number and any hand-entered data, and then adding in the relevent station summary data from the CSV we've just created.
To do this we use the create_datasheet()
function which takes in the following main arguments:
For example, with CTD data, our initial Excel datasheet looks like this:
knitr::kable(readxl::read_excel(system.file("extdata", "S285_ctd.xlsx", package="seaprocess"), col_types = "text"))
You can see that there is no info on location, timing, etc. but we have that all stored away elsewhere. All we need is to bring it together like this:
data_input <- system.file("extdata", "S285_ctd.xlsx", package="seaprocess") summary_input <- system.file("extdata", "S285_station.csv", package="seaprocess") data_type = "CTD" datasheet <- create_datasheet(data_input, summary_input, data_type)
knitr::kable(datasheet)
Behind the scenes, the code is simply reading in both files, matching up the deployment number and type and copying the appropriate metadata to the data sheet.
To get a working copy of this data, you can export it as a csv file, just like in the station summary example above
create_datasheet(data_input, summary_input, data_type, "<some-path-to-folder>/<C/SXXX>_ctd.csv"")
There is one additional (well, more right now) Neuston datasheet step which is to also calculate the biodensity of the tow by dividing the biovolume of zooplankton by the tow length.
Included as it's own section because of the complexity of what is involved here. Not only are there hand entered data to combine with summary metadata, there is also chemical analysis of collected water which also needs to be incorporated too.
The solution right now is to have calculation sheets which look very similar to the traditional SEA ones. I.e. a series of sheets that you use to build up from calibration curves to final outcomes. The key to our process is that each Excel workbook has a sheet called "output", which is what our function will go looking for in order to incorporate the data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.