{width=50%}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width=300)
library(strings)

1. PURPOSE

strings is an R package developed to help users compile, format, calculate, and visualize oceanographic data collected by the Centre for Marine Applied REsearch's (CMAR) Coastal Monitoring Program. The package can process temperature, dissolved oxygen, and salinity data measured by HOBO Pro V2, TidBiT, aquaMeasure DOT, aquaMeasure SAL, and/or VR2AR sensors from a single sensor string deployment.

The purpose of this vignette is to provide detailed instruction on how to compile data from a deployment into a single spreadsheet, assuming the data is structured as described in SOP009. Section 3.1 provides a general overview of the compile_*_data() family of functions. Section 3.2 is CMAR-specific, and shows how to compile CMP data using a template.

2. SCOPE/PRINCIPAL

  1. compile_*_data() family of functions * Individual compile functions * compile family wrapper function
  2. CMAR work flow: Compile_Template.R

3. PROCEDURE

3.1 The compile_*_data() family of functions

3.1.1 Individual compile functions

strings provides a separate compile_*_data() function for each type of sensor:

These functions are useful if you only want to compile data from one type of sensor. They compile all of the data in the corresponding folder into a single dataframe, with the option to export as a .csv file. The data is compiled in a "wide" format, with metadata in the first four rows indicating the deployment period, the sensor serial number, the variable and depth of the sensor, and the timezone of the timestamps. The first column is an index column, starting at -4 to account for the metadata rows. The remaining columns alternate between timestamp (in the format "Y-m-d H:M:S") and variable value (rounded to three decimal places). Note that because the sensors can be initialized at different times and record on different intervals, values in a single row do not necessarily correspond to the same timestamp.

See the help files for the arguments required to run these functions.

# Data compiled from the three Hobo sensors deployed at Birchy Head on May 2, 2019:
head(hobo_data, n = 10)

This format is convenient for human readers, who can quickly scan the metadata to determine the number of sensors deployed, the depths of deployment, etc. However, this format is less convenient for analysis (e.g., to include include the metadata, all values were converted to class character). Prior to analysis, the dataframe should be converted to a "long" ("tidy") format, and the values should be converted to the appropriate class (e.g., POSIXct for the timestamps and numeric for variable values). This can be done using the convert_to_tidydata() function, which returns a dataframe with the following columns:

hobo_tidy <- convert_to_tidydata(hobo_data[, -1]) # remove the INDEX column

head(hobo_tidy)

3.1.2 compile family wrapper function

strings also provides a function that will compile the data from all sensors on the string into a single dataframe, with the option to export as a .csv:

See the help file for more detail on this function, including the arguments required.

As with the other compile functions, compile_all_data returns the data in a wide format, with metadata in the first four rows. The columns alternate between timestamp (in the format "Y-m-d H:M:S") and variable value (rounded to three decimal places). Note that because the sensors can be initialized at different times and record on different time intervals, values in a single row do not necessarily correspond to the same timestamp.

# Data compiled from the all sensors deployed at Birchy Head on May 2, 2019:
head(wide_data, n = 10)

As described above, the data will be easier to work with after it has been converted to a long format:

ALL_tidy <- convert_to_tidydata(wide_data) 

head(ALL_tidy)

3.2 CMAR Workflow: Compile_Template.R

Compiling data for a single deployment is straightforward using the Compile_Template.R saved in the "Y:/Coastal Monitoring Program/Strings Files/Templates" folder. You should only need to change one line, but we will walk through each of the sections here.

Before compiling: 1. Make sure the folder structure is as described in Section 3.1 of SOP009. 1. Fill out the STRING TRACKING spreadsheet to make sure all of the data is in the appropriate folders. You may need to add a new row for the deployment you are compiling.

Open Compile_Template.R.

Save the file into the deployment folder with the name structure: “Compile_Deployment_Location_yyyy-mm-dd.R”

Figure 1. Save Compile_Template.R into the deployment folder of the string you are processing and re-name.

At the top of Compile_Template.R, fill in the current date, your name, the version of the strings package you are using, and any additional notes you deem necessary for future data management.

# DATE: 2020-Sep-02
# NAME: DD  
# strings VERSION: 1.1.0
# NOTES:

The next comments describe the sections of the script:

# SECTION 1: Define the path and variables
# 
# SECTION 2: Extract deployment information from the log
#   Only modify this section if one type of sensor noted in the log is not included on the string.
#   Set the argument for this sensor to NULL
# 
# SECTION 3: Compile data
#   You should not have to modify anything in this section
#   A file will be exported to the path (_raw.csv)
# 
# SECTION 4: Visualize data
#   Import data and visualize

Load the necessary libraries

# libraries
library(dplyr)   # for piping and data manipulation functions
library(readr)   # to write csv file
library(strings) # to compile data

Section 1: Update the path to the deployment folder you are processing. This should be the only line you need to change.

### Section 1: Define the path

# path to Log, Hobo, aquaMeasure, and Vemco folders
path <- file.path("Y:/Coastal Monitoring Program/Data_Strings/Birchy Head/Birchy Head 2019-05-02")

Section 2: Run this section to extract information from the deployment log. This information will be used as arguments in the compile_all_data() function.

# SECTION 2: Extract deployment information from the log ------------------
# Only modify this section if one type of sensor noted in the log is not included on the string.
# Set the argument for this sensor to NULL

# Extract information from the deployment log
log_info <- read_deployment_log(path)

# Define station name based on the log
area = log_info$area.info$station

# Define deployment start and end dates based on the log
deployment <- log_info$deployment.dates

# Create a table of HOBO sensors and deployment depths based on the log
serial.table.HOBO <- log_info$HOBO

# Create a table of aquaMeasure sensors and deployment depths based on the log
serial.table.aM <- log_info$aM

# Define the deployment depth of the VR2 sensor based on the log
depth.vemco <- log_info$vemco$DEPTH

Section 3: Run this section to compile the data. A note will be printed to the console when each sensor has been compiled. A note will be printed indicating which variables were found in each aquaMeasure file.

Parsing errors may occur from end-of line rows in the Hobo files or the Text column in the aquaMeasure files. These errors can typically be ignored. You can delete the offending rows or columns (these should not include useful data) to make sure they were the source of the error.

The function will automatically trim data to deployment and retrieval dates included in the log (supplied in the deployment.range argument).

#Compile data from a single deployment
ALL_data <- compile_all_data(path = path,
                             deployment.range = deployment,
                             area.name = area,
                             # hobo
                             serial.table.HOBO = serial.table.HOBO,
                             # aquaMeasure
                             serial.table.aM = serial.table.aM,
                             # vemco
                             depth.vemco = depth.vemco)

# Name the file
file_name <- name_compiled_data(area.name = area,
                                deployment.start = deployment$start.date,
                                vars = unique(convert_to_tidydata(ALL_data)$VARIABLE))

# Write csv file with compiled data
write_csv(ALL_data, paste(path, "/", file_name, "_raw.csv", sep = ""), col_names = FALSE)

The naming convention for the csv is Station Name_Deployment-Date_Variable_raw, where "raw" indicates that the data has not been trimmed. The csv file will be saved to the deployment folder (Fig. 2).

Figure 2. The csv of compiled data will be saved to the deployment folder

Section 4: This section is meant as a check to ensure the data compiled properly (don't worry if the figure is not pretty). Compare the variables and depths in the figure to those recorded in the log.

ALL_raw <- wide_data 
area <- "Borgles Island"
# Import and plot the raw data
ALL_raw <- read_csv(paste(path, "/", file_name, "_raw.csv", sep = ""), col_names = FALSE)
ALL_raw_tidy <- convert_to_tidydata(ALL_raw)

# Update vars.to.plot and ylab.units as necessary. 
# Remove any variables and their associated units if they aren't present in your data.
plot_variables_at_depth(ALL_raw_tidy, plot.title = area)

You will get a note from ggplot if any dissolved oxygen values are >130 % or < 60 % (the limits on the y-axis of the plot). These values will be removed if necessary when the data is trimmed (see SOP011).

Note the outliers at the beginning and end of the figure. These values were recorded before and after the string was deployed and should be trimmed off prior to data analysis. This is the subject of the next vignette.

Mark on the STRING TRACKING sheet that the deployment has been compiled (you may need to ask Danielle or Nicole for access).



Centre-for-Marine-Applied-Research/strings documentation built on Aug. 21, 2023, 8 a.m.