{width=50%}
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
options(width=300)
library(strings) library(dplyr)
strings
is an R package developed to help users compile, format, calculate, and visualize oceanographic data collected by the Centre for Marine Applied REsearch's (CMAR) Coastal Monitoring Program. The package can process temperature, dissolved oxygen, and salinity data measured by HOBO Pro V2, TidBiT, aquaMeasure DOT, aquaMeasure SAL, and/or VR2AR sensors from a single sensor string deployment.
The purpose of this vignette is to provide detailed instruction on how to trim compiled Coastal Monitoring Program data following CMAR's workflow. The trimmed data is exported to the deployment folder and also formatted and exported to the appropriate Open Data Portal county folder.
Detailed instruction on how to compile data is in SOP010.
Section 3.1 provides a general overview of the trim_data()
function. Section 3.2 is CMAR-specific, and shows how to trim and export Coastal Monitoring Program data using a template.
trim_data()
functionTrim_Template.R
trim_data()
functionThe trim_data()
function trims the compiled data (in long/tidy format) to specified dates by filtering on the TIMESTAMP
column. The function will only trim one variable at a time, so this function works best in a pipeline. You must also specify which sensors to trim.
See ?trim_data
for more detail on function arguments.
Start and end times of the raw data:
wide_data %>% convert_to_tidydata() %>% group_by(VARIABLE) %>% summarise(MIN_TIMESTAMP = min(TIMESTAMP), MAX_TIMESTAMP = max(TIMESTAMP))
In this example, all variables are trimmed to the same start and end times. This does not have to be the case - you can provide different start/end times for each variable in the different sections of the pipe.
# specify the timestamp for the FIRST VALID measurement # (this can be copy/pasted from the _raw.csv file) start.timestamp <- "2019-05-30 7:34:00 PM" # specify the timestamp for the LAST VALID measurement # (this can be copy/pasted from the _raw.csv file) last.timestamp <- "2019-10-19 2:01:00 PM" # sensors that recorded temperature data that should be trimmed sensors.temp <- c("HOBO", "aquaMeasure", "VR2AR") # sensors that recorded dissolved oxygen data that should be trimmed sensors.DO <- c("aquaMeasure") # info to trim the salinity data sensors.sal <- c("aquaMeasure") # convert the wide data to long data dat_trim <- wide_data %>% convert_to_tidydata() %>% # trim temperature data trim_data(var.to.trim = "Temperature", start.datetime = start.timestamp , end.datetime = last.timestamp, sensors.to.trim = sensors.temp) %>% # trim DO data trim_data(var.to.trim = "Dissolved Oxygen", start.datetime = start.timestamp , end.datetime = last.timestamp, sensors.to.trim = sensors.DO) #%>% # # trim salinity data # trim_data(var.to.trim = "Salinity", # start.datetime = start.timestamp , # end.datetime = last.timestamp, # sensors.to.trim = sensors.sal)
Start and end times of the trimmed data:
dat_trim %>% group_by(VARIABLE) %>% summarise(MIN_TIMESTAMP = min(TIMESTAMP), MAX_TIMESTAMP = max(TIMESTAMP))
An error will be printed if you try to trim a variable from a sensor that did not measure that variable.
Trim_Template.R
Before trimming: * Check the STRING TRACKER to determine which deployments are a priority for sending to Open Data.
Trimming data for a single deployment should be done by following Trim_Template.R
saved in the "Y:/Coastal Monitoring Program/Strings Files/Templates" folder. This will export the trimmed data to the deployment folder and the Open Data folder. We will walk through each of the sections here.
Open Trim_Template.R
.
Save the file into the deployment folder with the name structure: “Trim_Deployment_Location_yyyy-mm-dd.R”.
At the top of Trim_Template.R
, fill in the current date, your name, the version of the strings
package you are using, and any additional notes you deem necessary for future data management.
# DATE: 2020-Sep-02 # NAME: DD # strings VERSION: 1.1.0 # NOTES:
The next comments describe the sections of the script:
# Template for trimming data compiled from a sensor string deployment # Returns trimmed data as a csv file in the final folder on path # and in the appropriate county folder for transfer to Open Data Portal # SECTION 1: Import and visualize raw data # SECTION 2: Trim data # SECTION 3: Visualize and Export trimmed data # SECTION 4: Export for Open Data
Load the necessary libraries
# libraries library(dplyr) # to pipe and manipulate data library(readr) # to read and write csv files library(strings) # for string data functions library(ggplot2) # for DO plot library(lubridate) # for DO plot library(googlesheets4) # for county info
Section 1:
Update the path
variable to the deployment folder you are processing. You do not need to change the path.export
variable.
# Section 1: Import and visualize raw data --------------------------------------------------------------- # path to the raw data file path <- file.path("Y:/Coastal Monitoring Program/Data_Strings/Birchy Head/Birchy Head 2019-05-02") # path to export the trimmed and formatted data (county name pasted on below) path.export <- file.path("Y:/Coastal Monitoring Program/Open Data/")
Change the file name to the name of the raw data .csv file, but without the "_raw".
# Raw data ---------------------------------------------------------------- # file name file.name <- "Birchy Head_2019-05-02_TEMP_DO"
Import the raw data file.
# import raw data dat_raw <- read_csv(paste(path, "/", file.name, "_raw.csv", sep = ""), col_names = FALSE)
dat_raw <- wide_data
Convert to long (tidy) format and plot to get a feel for where to trim the data. In our example, sensors measured temperature, dissolved oxygen, and salinity.
dat_raw <- convert_to_tidydata(dat_raw) # plot raw data plot_variables_at_depth(dat_raw)
Section 2: Determine where to trim the data for each variable. This can be an iterative process. Use the figure from the previous step to get a general idea of when to trim (e.g., the approximate value of the first reliable measurements), and the raw .csv file to determine the timestamps of the first and last reliable measurements.
In our Birchy Head example, the first reliable temperature values are less than 5 degrees. Looking at data from one of the Hobos, we see that the temperature stabilizes around "2019-05-02 8:45:22 PM":
head(wide_data[,1:2], n = 20)
Check the temperature data from the other sensors to make sure this timestamp is appropriate, and adjust it if necessary.
Repeat this procedure to find the timestamp of the last reliable temperature measurement. For our Birchy Head data, we find this to be "2019-11-22 2:30:22 PM".
Let's start by trimming all three variables to these start and end dates.
# info to trim the temperature data sensors.temp <- c("HOBO", "aquaMeasure", "VR2AR") start.temp <- "2019-05-30 7:34:00 PM" end.temp <- "2019-10-19 2:01:00 PM" # info to trim the DO data sensors.DO <- c("aquaMeasure") start.DO <- "2019-05-30 7:34:00 PM" end.DO <- "2019-10-19 2:01:00 PM" # info to trim the salinity data # sensors.sal <- c("aquaMeasure") # start.sal <- "2019-05-02 8:45:22 PM" # end.sal <- "2019-11-22 2:30:22 PM" # convert the wide data to long data dat_trim <- dat_raw %>% # trim tempterature data trim_data(var.to.trim = "Temperature", start.datetime = start.temp, end.datetime = end.temp, sensors.to.trim = sensors.temp) %>% # trim DO data trim_data(var.to.trim = "Dissolved Oxygen", start.datetime = start.DO, end.datetime = end.DO, sensors.to.trim = sensors.DO) #%>% # trim salinity data (you only need this part of the pipe if salinity was measured during the deployment) # trim_data(var.to.trim = "Salinity", # start.datetime = start.sal, # end.datetime = end.sal, # sensors.to.trim = sensors.sal)
Section 3:
Use the plot_variables_at_depth()
function to see if the outliers are removed. Modify the arguments as necessary.
# plot trimmed data plot_variables_at_depth(dat_trim)
This looks pretty good for temperature and salinity. If there were still outliers, we would need to adjust the start.datetime
or end.datetime
arguments.
The dissolved oxygen signal is consistent with biofouling and should be trimmed farther.
Let's adjust the start.DO
argument, but leave the other timestamps the same.
# info to trim the temperature data sensors.temp <- c("HOBO", "aquaMeasure", "VR2AR") start.temp <- "2019-05-02 8:45:22 PM" end.temp <- "2019-11-22 2:30:22 PM" # info to trim the DO data sensors.DO <- c("aquaMeasure") start.DO <- "2019-09-02 00:00:00 PM" end.DO <- "2019-11-22 2:30:22 PM" # info to trim the salinity data sensors.sal <- c("aquaMeasure") start.sal <- "2019-05-02 8:45:22 PM" end.sal <- "2019-11-22 2:30:22 PM" # convert the wide data to long data dat_trim <- dat_raw %>% # trim tempterature data trim_data(var.to.trim = "Temperature", start.datetime = start.temp, end.datetime = end.temp, sensors.to.trim = sensors.temp) %>% # trim DO data trim_data(var.to.trim = "Dissolved Oxygen", start.datetime = start.DO, end.datetime = end.DO, sensors.to.trim = sensors.DO) #%>% # trim salinity data (you only need this part of the pipe if salinity was measured during the deployment) # trim_data(var.to.trim = "Salinity", # start.datetime = start.sal, # end.datetime = end.sal, # sensors.to.trim = sensors.sal)
we can plot all three trimmed variables again to check that there are no outliers:
# plot trimmed data plot_variables_at_depth(dat_trim)
We can also plot dissolved oxygen and add a geom_vline
to double-check the trim date:
# check where DO cutoff is DO <- dat_trim %>% filter(VARIABLE == "Dissolved Oxygen") plot_variables_at_depth(dat_raw) DO[[1]] + geom_vline(xintercept = as_datetime(start.DO))
Re-run Sections 2 and the visualizations to tweak the start.datetime
and end.datetime
until data for all sensors are appropriately trimmed, and then export the long (tidy) version of the data to the deployment folder:
# export trimmed data write_csv(dat_trim, file = paste(path, "/", file.name, "_trimmed.csv", sep = ""))
Section 4: You should not need to change anything in this section. It formats the data for the Open Data Portal, and exports it to the appropriate county folder.
# SECTION 4: Export for Open Data ---------------------------------------------------- # read deployment log for the area info log <- read_deployment_log(path) location <- log$area.info # allow access to the google sheet googlesheets4::gs4_deauth() # link to the "STRING TRACKING" google sheet link <- "https://docs.google.com/spreadsheets/d/1a3QvJsvwr4dd64g3jxgewRtfutIpsKjT2yrMEAoxA3I/edit#gid=828367890" # read in the "Area Info" tab of the STRING TRACKING sheet Area_Info <- googlesheets4::read_sheet(link, sheet = "Area Info") # look up the Station name in the Area Info tab and return the county county <- Area_Info[which(Area_Info$Station == location$station), "County"] # warnings if there is more than one entry OR no entries for this station in the Area Info tab if(length(county > 1)) warning(paste("There is more than one station named", location$station, "in the Area Info tab")) if(length(county < 1)) warning(paste("There is no station named", location$station, "in the Area Info tab")) # finish the path.export county <- county$County path.export <- paste(path.export, county, "data", sep = "/") # name for the file open.data.name <- name_for_open_data(file.name) # format for Open Data (add location columns) dat_open <- format_for_opendata(dat_trim, location) # writw to county folder write_csv(dat_open, file = paste(path.export, "/", open.data.name, ".csv", sep = ""))
Mark on the STRING TRACKING sheet that this deployment has been trimmed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.