knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(seaprocess)

Introduction

This vignette is an overview of the seaprocess package and what it does to process data collected aboard the Sea Education Association's Sailing School Vessels the Corwith Cramer and the Robert C. Seamans.

The goals of this package are to:

OVERVIEW PROCESS DIAGRAM

Processing Philosophy

The processing functions in this package are built such that in theory you should be able run a very general command that should take the raw data in any stream and take it all the way through to final output. However, there are always odd data examples in any cruise so built in to the processing routines are ways to initialize these higher level functions to change processing routines at lower levels

For example:

To process elg event files we use the function process_elg(). The only thing you have to specify is the location of the folder that contains the elg event files. ie.

process_elg("data/elg")

Under the hood, process_elg() is calling:

The complete list of settings you can send to process_elg() are:

process_elg(
  elg_folder,
  cruiseID = NULL,
  csv_folder = "output/csv",
  csv_filename = "elg.csv",
  odv_folder = "output/odv/elg",
  odv_filename = "elg.txt",
  add_cruiseID = TRUE,
  average_window = 60,
  ...
)

So, elg_folder is the only thing you need to specify, but you can change any of the other arguments to customize the performance. See the help file for `process_elg()1 for details of what each do.

Continuous data streams

Cruise Track and Surface data

SEA records it's location and flowthrough data in a one-minute resolution text files with .elg file extensions. These are known as event files and they are created through a piece of software called SCS. These data files combine locational data with data from the flowthrough system and other data devices such as (in some cases) the CHIRP and weather system.

These data files are relatively simply read-in as they are in a comma-separated-value (csv) file format with one header line of column titles and corresponding data lines with columns separated by commas. The column headings are a little different between ships and between eras on board so the reading script is flexible to known variations in headings.

To read in an elg file you can use:

elg_data <- read_elg(system.file("extdata", "example_data","Event60sec_012.elg", package="seaprocess"))

ADCP

Station data streams

Many data on SEA cruises are collected at discrete station locations. We stop (or partially stop) the ship at a location to, for example, take CTD measurements, water samples or tow nets. We will refer to the data collected during these activities as station data for the purposes of this document.

Station data is a combination of data recorded manually, data recorded by some device (i.e. CTD) and metadata of the time, location, and other environmental parameters. All of these data are recorded on a paper data sheet, but the goal of this portion of the computer data processing is to limit the amount of metadata that we need to input into the computer that is already being recorded digitally elsewhere.

Overall justfication and flow

SKETCH PROCESS

Station summary sheet

The starting point for all of this is to create an accurate station summary sheet. This will include information for each station including:

Once we've created this table, we can then use the metadata to populate specific data sheets for the individual deployments using only the bare minimum of hand-entered data. This might include sensor IDs for a CTD cast or the 100 count values for a nueston tow.

Setting up the intial Excel Sheet

We begin creating a station summary sheet by using an Excel spreadsheet with the following column headers:

ADD SCREEN GRAB

NOTE: There is a new line on the sheet for every deployment type at every station so as to make use of the actual time of deployment in finding the location (see below). An example of this can be seen here.

knitr::kable(readxl::read_excel(system.file("extdata", "S285_station.xlsx", package="seaprocess"), col_types = "text"))

Combining with continuous data

The next step is to match up the time of deployment with the location and environmental data recorded by SCS to populate the rest of the summary sheet with electronically recorded metadata.

This is acheived using the create_summary() function. Create summary takes in the following inputs:

With these inputs, the function reads in the elg information, looks at all the dates/times of the deployments, and find the matching location and environmental metadata to add to the station summary. The result is a table that looks the same as the input station Excel sheet, but with aditional columns of:

An example of these code in action and the output can be seen here.

summary_input <- system.file("extdata", "S285_station.xlsx", package="seaprocess")
elg_input <- system.file("extdata", "S285_elg", package="seaprocess")
summary <- create_summary(summary_input, elg_input)
knitr::kable(summary)

ADD IN CODE AND DESCRIPTION OF FAIL SAFES IF THE ELG DATA IS NOT AVAILABLE

To move on to populating specific datasheets with this station summary data, we need to save this station summary to a CSV file which we can do by including a file path for the csv_output argument. In this case, you wouldn't need to then assign the output of the function to some object (in our case summary) unless you needed this object for something.

create_summary(summary_input, elg_input, "<some-path-to-folder>/<C/SXXX>_station_summary.csv")

Other basic data sheets

We can now employ a mostly universal concept of taking any deployment (even a non-standard, cruise specific one), creating a Excel sheet with station number and any hand-entered data, and then adding in the relevent station summary data from the CSV we've just created.

To do this we use the create_datasheet() function which takes in the following main arguments:

For example, with CTD data, our initial Excel datasheet looks like this:

knitr::kable(readxl::read_excel(system.file("extdata", "S285_ctd.xlsx", package="seaprocess"), col_types = "text"))

You can see that there is no info on location, timing, etc. but we have that all stored away elsewhere. All we need is to bring it together like this:

data_input <- system.file("extdata", "S285_ctd.xlsx", package="seaprocess")
summary_input <- system.file("extdata", "S285_station.csv", package="seaprocess")
data_type = "CTD"
datasheet <- create_datasheet(data_input, summary_input, data_type)
knitr::kable(datasheet)

Behind the scenes, the code is simply reading in both files, matching up the deployment number and type and copying the appropriate metadata to the data sheet.

To get a working copy of this data, you can export it as a csv file, just like in the station summary example above

create_datasheet(data_input, summary_input, data_type, "<some-path-to-folder>/<C/SXXX>_ctd.csv"")

Neuston datasheet

There is one additional (well, more right now) Neuston datasheet step which is to also calculate the biodensity of the tow by dividing the biovolume of zooplankton by the tow length.

Bottle datasheet

Included as it's own section because of the complexity of what is involved here. Not only are there hand entered data to combine with summary metadata, there is also chemical analysis of collected water which also needs to be incorporated too.

The solution right now is to have calculation sheets which look very similar to the traditional SEA ones. I.e. a series of sheets that you use to build up from calibration curves to final outcomes. The key to our process is that each Excel workbook has a sheet called "output", which is what our function will go looking for in order to incorporate the data



benharden27/seaprocess documentation built on June 28, 2023, 7:20 p.m.