knitr::opts_chunk$set(echo = TRUE)
library(insitu)
This package was developed to streamline QA/QC of hydrographic (CTD and current) timeseries data from the Arctic coast of Alaska within the R environment. Specifically, functions in this package calibrate data and flag suspicious data either programatically or manually. Many of the functions within have general use for QA/QC of hydrographic timeseries regardless of location.
For example: Calculate salinity from conductivty and temperature ising the UNESCO algorithm
calculate_salinity(25.50,1)
Easily read in files from either RBR or starOddi CTDs, or Lowell Tilt Current Meters via import_data
The remainder of this vignette demonstrates a typical workflow for QA/QC of such datasets.
First let's load in some handy packages
library(RCurl) #For loading in data directly from online repositories library(tidyverse) #we're going to work in the tidyverse for data manipulation and plotting library(cowplot)
Read in some data (this is Beaufort Lagoon Ecosystem LTER data from 2018-2019)
urlCSV <- getURL("https://portal-s.edirepository.org/nis/dataviewer?packageid=knb-lter-ble.3.7&entityid=534373640a661d8ed0bfedc52479a133", timeout = 200) txtCSV <- textConnection(urlCSV) tempsal_data <- read.csv(txtCSV, stringsAsFactors = F) %>% filter(station=="KALD1") %>% mutate(date_time=as.POSIXct(date_time, format="%m/%d/%Y %H:%M")) close(txtCSV) head(tempsal_data) str(tempsal_data)
Simple timeseries plots
ggplot(tempsal_data, aes(date_time, temperature))+ geom_line() ggplot(tempsal_data, aes(date_time, conductivity))+ geom_line()
Some of this conductivity data is clearly bad (look at that June-July weirdness)! We need to calibrate the data and remove "bad" points.
Read in data from YSI sonde measurements after deployment and just before retrieval of the CTD
urlCSV <- getURL("https://portal-s.edirepository.org/nis/dataviewer?packageid=knb-lter-ble.3.7&entityid=6d327300d4893e4f932220fecf49c507", timeout = 200) txtCSV <- textConnection(urlCSV) ysi<- read.csv(txtCSV, stringsAsFactors = F) %>% filter(station=="KALD1") %>% mutate(date_time=as.POSIXct(substr(date_time,1, 16), format="%Y-%m-%dT%H:%M"))#datetime is in ISO 8601, but inports as character close(txtCSV) head(ysi)
Calibrate the data based on linear relationship between these two "anchor" points. Note that this will also work with >2 anchor points, for which the adjustment will be based on the linear relationship between each sequential pair of anchor points. If data exists before or after anchor points, the linear relationship will use the first (in the before case), or last (in the after case) reading from the raw data.
tempsal_data_calibrated <- calibrate_data(tempsal_data, ysi, raw= "temperature", cal_by = "temperature_C") tempsal_data_calibrated <- calibrate_data(tempsal_data_calibrated, ysi, raw= "conductivity", cal_by = "conductivity_uS_cm") head(tempsal_data_calibrated) ggplot(tempsal_data_calibrated, aes(date_time, temperature))+ geom_line()+ geom_line(aes(date_time, temperature_calibrated), color="blue") ggplot(tempsal_data_calibrated, aes(date_time, conductivity))+ geom_line()+ geom_line(aes(date_time, conductivity_calibrated), color="blue")
Ok, now that the data is calibrated, let's calculate and plot salinity. We're going to to this the "tidyverse" way
tempsal_data_calibrated <- tempsal_data_calibrated %>% mutate(salinity_calibrated=calculate_salinity(conductivity_calibrated, temperature_calibrated)) ggplot(tempsal_data_calibrated, aes(date_time, salinity_calibrated))+ geom_line()
Clearly some of that data is still bad, so let's flag it...
Temperature/salinity plots can easily tell you when something is amiss in your data. This often occurs because the conductivity reading if off due to obstruction by sediments or fouling (for instruments with a conductivity cup), or by nearby ice or other conductive materials (for RBR "donuts"). In cold regions, data below the freezing line is expecially suspect.
plot_tempsal(tempsal_data_calibrated, "temperature_calibrated", "salinity_calibrated", "date_time", plottitle="example")
The straight lines for points in the temp vs sal plot indicate intstrument error. We can flag the points below the freezing line programatically. This function creates a column anomalous
to indicated flagged data
tempsal_data_flagged <- flag_salinity(tempsal_data_calibrated, tempcol="temperature_calibrated", condcol="conductivity_calibrated",Terror=.1, Cerror = 2, flag_scheme = c("VALID","INV")) head(tempsal_data_flagged) ggplot(tempsal_data_flagged, aes(date_time, salinity_calibrated))+ geom_point(aes(color=anomalous))
Clearly, this doesn't take care of all the suspicous data.
This function will open up an external window where you can manually click on suspicious points. Click "Stop" in the upper left of the pop-up when you are done.
bad_points <- id_outlier(tempsal_data_calibrated, datecol="date_time", tempcol="temperature_calibrated", salcol="salinity_calibrated") bad_points
Note that you can incorporate this output into the flag column of your dataframe via indexing
tempsal_data_flagged$anomalous[bad_points] <- "Questionable" ggplot(tempsal_data_flagged, aes(date_time, salinity_calibrated))+ geom_point(aes(color=anomalous))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.