{width=50%}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(width=300)
library(strings)
library(dplyr)

1. PURPOSE

strings is an R package developed to help users compile, format, calculate, and visualize oceanographic data collected by the Centre for Marine Applied REsearch's (CMAR) Coastal Monitoring Program. The package can process temperature, dissolved oxygen, and salinity data measured by HOBO Pro V2, TidBiT, aquaMeasure DOT, aquaMeasure SAL, and/or VR2AR sensors from a single sensor string deployment.

The purpose of this vignette is to provide detailed instruction on how to trim compiled Coastal Monitoring Program data following CMAR's workflow. The trimmed data is exported to the deployment folder and also formatted and exported to the appropriate Open Data Portal county folder.

Detailed instruction on how to compile data is in SOP010.

Section 3.1 provides a general overview of the trim_data() function. Section 3.2 is CMAR-specific, and shows how to trim and export Coastal Monitoring Program data using a template.

2. SCOPE/PRINCIPAL

  1. trim_data() function
  2. CMAR work flow: Trim_Template.R

3. PROCEDURE

3.1 The trim_data() function

The trim_data() function trims the compiled data (in long/tidy format) to specified dates by filtering on the TIMESTAMP column. The function will only trim one variable at a time, so this function works best in a pipeline. You must also specify which sensors to trim.

See ?trim_data for more detail on function arguments.

Start and end times of the raw data:

wide_data %>% 
  convert_to_tidydata() %>% 
  group_by(VARIABLE) %>% 
  summarise(MIN_TIMESTAMP = min(TIMESTAMP), MAX_TIMESTAMP = max(TIMESTAMP)) 

In this example, all variables are trimmed to the same start and end times. This does not have to be the case - you can provide different start/end times for each variable in the different sections of the pipe.

# specify the timestamp for the FIRST VALID measurement
# (this can be copy/pasted from the _raw.csv file)
start.timestamp <- "2019-05-30  7:34:00 PM"

# specify the timestamp for the LAST VALID measurement 
# (this can be copy/pasted from the _raw.csv file)
last.timestamp <- "2019-10-19  2:01:00 PM"

# sensors that recorded temperature data that should be trimmed
sensors.temp <- c("HOBO", "aquaMeasure", "VR2AR")

# sensors that recorded dissolved oxygen data that should be trimmed
sensors.DO <- c("aquaMeasure")

# info to trim the salinity data
sensors.sal <- c("aquaMeasure")

# convert the wide data to long data
dat_trim <- wide_data %>% 
  convert_to_tidydata() %>% 
  # trim temperature data
  trim_data(var.to.trim = "Temperature",
            start.datetime = start.timestamp ,
            end.datetime = last.timestamp,
            sensors.to.trim = sensors.temp) %>%
  # trim DO data
  trim_data(var.to.trim = "Dissolved Oxygen",
            start.datetime = start.timestamp ,
            end.datetime = last.timestamp,
            sensors.to.trim = sensors.DO) #%>% 
  # # trim salinity data
  # trim_data(var.to.trim = "Salinity",
  #           start.datetime = start.timestamp ,
  #           end.datetime = last.timestamp,
  #           sensors.to.trim = sensors.sal)

Start and end times of the trimmed data:

dat_trim %>% 
  group_by(VARIABLE) %>% 
  summarise(MIN_TIMESTAMP = min(TIMESTAMP), MAX_TIMESTAMP = max(TIMESTAMP)) 

An error will be printed if you try to trim a variable from a sensor that did not measure that variable.

3.2 CMAR Workflow: Trim_Template.R

Before trimming: * Check the STRING TRACKER to determine which deployments are a priority for sending to Open Data.

Trimming data for a single deployment should be done by following Trim_Template.R saved in the "Y:/Coastal Monitoring Program/Strings Files/Templates" folder. This will export the trimmed data to the deployment folder and the Open Data folder. We will walk through each of the sections here.

Open Trim_Template.R.

Save the file into the deployment folder with the name structure: “Trim_Deployment_Location_yyyy-mm-dd.R”.

Figure 1. Save Trim_Template.R into the deployment folder of the string you are processing and re-name.

At the top of Trim_Template.R, fill in the current date, your name, the version of the strings package you are using, and any additional notes you deem necessary for future data management.

# DATE: 2020-Sep-02
# NAME: DD  
# strings VERSION: 1.1.0
# NOTES:

The next comments describe the sections of the script:

# Template for trimming data compiled from a sensor string deployment
# Returns trimmed data as a csv file in the final folder on path
# and in the appropriate county folder for transfer to Open Data Portal

# SECTION 1: Import and visualize raw data

# SECTION 2: Trim data

# SECTION 3: Visualize and Export trimmed data

# SECTION 4: Export for Open Data

Load the necessary libraries

# libraries
library(dplyr)         # to pipe and manipulate data
library(readr)         # to read and write csv files
library(strings)       # for string data functions
library(ggplot2)       # for DO plot
library(lubridate)     # for DO plot
library(googlesheets4) # for county info

Section 1: Update the path variable to the deployment folder you are processing. You do not need to change the path.export variable.

# Section 1: Import and visualize raw data ---------------------------------------------------------------

# path to the raw data file
path <- file.path("Y:/Coastal Monitoring Program/Data_Strings/Birchy Head/Birchy Head 2019-05-02")

# path to export the trimmed and formatted data (county name pasted on below)
path.export <- file.path("Y:/Coastal Monitoring Program/Open Data/")

Change the file name to the name of the raw data .csv file, but without the "_raw".

# Raw data ----------------------------------------------------------------

# file name
file.name <- "Birchy Head_2019-05-02_TEMP_DO"

Import the raw data file.

# import raw data
dat_raw <- read_csv(paste(path, "/", file.name, "_raw.csv", sep = ""),
                    col_names = FALSE)
dat_raw <- wide_data

Convert to long (tidy) format and plot to get a feel for where to trim the data. In our example, sensors measured temperature, dissolved oxygen, and salinity.

dat_raw <- convert_to_tidydata(dat_raw)

# plot raw data
plot_variables_at_depth(dat_raw)

Section 2: Determine where to trim the data for each variable. This can be an iterative process. Use the figure from the previous step to get a general idea of when to trim (e.g., the approximate value of the first reliable measurements), and the raw .csv file to determine the timestamps of the first and last reliable measurements.

In our Birchy Head example, the first reliable temperature values are less than 5 degrees. Looking at data from one of the Hobos, we see that the temperature stabilizes around "2019-05-02 8:45:22 PM":

head(wide_data[,1:2], n = 20)

Check the temperature data from the other sensors to make sure this timestamp is appropriate, and adjust it if necessary.

Repeat this procedure to find the timestamp of the last reliable temperature measurement. For our Birchy Head data, we find this to be "2019-11-22 2:30:22 PM".

Let's start by trimming all three variables to these start and end dates.

# info  to trim the temperature data
sensors.temp <- c("HOBO", "aquaMeasure", "VR2AR")
start.temp <- "2019-05-30  7:34:00 PM"
end.temp <- "2019-10-19  2:01:00 PM"

# info to trim the DO data
sensors.DO <- c("aquaMeasure")
start.DO <- "2019-05-30  7:34:00 PM"
end.DO <- "2019-10-19  2:01:00 PM"

# info to trim the salinity data
# sensors.sal <- c("aquaMeasure")
# start.sal <- "2019-05-02  8:45:22 PM"
# end.sal <- "2019-11-22  2:30:22 PM"

# convert the wide data to long data
dat_trim <- dat_raw %>% 
  # trim tempterature data
  trim_data(var.to.trim = "Temperature",
            start.datetime = start.temp,
            end.datetime = end.temp,
            sensors.to.trim = sensors.temp) %>%
  # trim DO data
  trim_data(var.to.trim = "Dissolved Oxygen",
            start.datetime = start.DO,
            end.datetime = end.DO,
            sensors.to.trim = sensors.DO) #%>% 
  # trim salinity data (you only need this part of the pipe if salinity was measured during the deployment)
  # trim_data(var.to.trim = "Salinity",
  #           start.datetime = start.sal,
  #           end.datetime = end.sal,
  #           sensors.to.trim = sensors.sal)

Section 3: Use the plot_variables_at_depth() function to see if the outliers are removed. Modify the arguments as necessary.

# plot trimmed data
plot_variables_at_depth(dat_trim)

This looks pretty good for temperature and salinity. If there were still outliers, we would need to adjust the start.datetime or end.datetime arguments.

The dissolved oxygen signal is consistent with biofouling and should be trimmed farther.

Let's adjust the start.DO argument, but leave the other timestamps the same.

# info  to trim the temperature data
sensors.temp <- c("HOBO", "aquaMeasure", "VR2AR")
start.temp <- "2019-05-02  8:45:22 PM"
end.temp <- "2019-11-22  2:30:22 PM"

# info to trim the DO data
sensors.DO <- c("aquaMeasure")
start.DO <- "2019-09-02  00:00:00 PM"
end.DO <- "2019-11-22  2:30:22 PM"

# info to trim the salinity data
sensors.sal <- c("aquaMeasure")
start.sal <- "2019-05-02  8:45:22 PM"
end.sal <- "2019-11-22  2:30:22 PM"

# convert the wide data to long data
dat_trim <- dat_raw %>% 
  # trim tempterature data
  trim_data(var.to.trim = "Temperature",
            start.datetime = start.temp,
            end.datetime = end.temp,
            sensors.to.trim = sensors.temp) %>%
  # trim DO data
  trim_data(var.to.trim = "Dissolved Oxygen",
            start.datetime = start.DO,
            end.datetime = end.DO,
            sensors.to.trim = sensors.DO) #%>% 
  # trim salinity data (you only need this part of the pipe if salinity was measured during the deployment)
  # trim_data(var.to.trim = "Salinity",
  #           start.datetime = start.sal,
  #           end.datetime = end.sal,
  #           sensors.to.trim = sensors.sal)

we can plot all three trimmed variables again to check that there are no outliers:

# plot trimmed data
plot_variables_at_depth(dat_trim)

We can also plot dissolved oxygen and add a geom_vline to double-check the trim date:

# check where DO cutoff is
DO <- dat_trim %>% 
  filter(VARIABLE == "Dissolved Oxygen")
  plot_variables_at_depth(dat_raw)

DO[[1]] + geom_vline(xintercept = as_datetime(start.DO))

Re-run Sections 2 and the visualizations to tweak the start.datetime and end.datetime until data for all sensors are appropriately trimmed, and then export the long (tidy) version of the data to the deployment folder:

# export trimmed data
write_csv(dat_trim, file = paste(path, "/", file.name, "_trimmed.csv", sep = ""))

Section 4: You should not need to change anything in this section. It formats the data for the Open Data Portal, and exports it to the appropriate county folder.

# SECTION 4: Export for Open Data ----------------------------------------------------

# read deployment log for the area info
log <- read_deployment_log(path)
location <- log$area.info

# allow access to the google sheet
googlesheets4::gs4_deauth()

# link to the "STRING TRACKING" google sheet
link <- "https://docs.google.com/spreadsheets/d/1a3QvJsvwr4dd64g3jxgewRtfutIpsKjT2yrMEAoxA3I/edit#gid=828367890"

# read in the "Area Info" tab of the STRING TRACKING sheet
Area_Info <- googlesheets4::read_sheet(link, sheet = "Area Info")

# look up the Station name in the Area Info tab and return the county
county <- Area_Info[which(Area_Info$Station == location$station), "County"]

# warnings if there is more than one entry OR no entries for this station in the Area Info tab
if(length(county > 1)) warning(paste("There is more than one station named", location$station, "in the Area Info tab"))
if(length(county < 1)) warning(paste("There is no station named", location$station, "in the Area Info tab"))

# finish the path.export
county <- county$County
path.export <- paste(path.export, county, "data", sep = "/")

# name for the file
open.data.name <- name_for_open_data(file.name)

# format for Open Data (add location columns)
dat_open <- format_for_opendata(dat_trim, location)

# writw to county folder
write_csv(dat_open, file = paste(path.export, "/", open.data.name, ".csv", sep = ""))

Mark on the STRING TRACKING sheet that this deployment has been trimmed.



Centre-for-Marine-Applied-Research/strings documentation built on Aug. 21, 2023, 8 a.m.