knitr::opts_chunk$set(echo = TRUE)
library(insitu)

This package was developed to streamline QA/QC of hydrographic (CTD and current) timeseries data from the Arctic coast of Alaska within the R environment. Specifically, functions in this package calibrate data and flag suspicious data either programatically or manually. Many of the functions within have general use for QA/QC of hydrographic timeseries regardless of location.

For example: Calculate salinity from conductivty and temperature ising the UNESCO algorithm

calculate_salinity(25.50,1)

Easily read in files from either RBR or starOddi CTDs, or Lowell Tilt Current Meters via import_data


How to Perform quality control on CTD datasets, with particular focus on Arctic nearshore systems

The remainder of this vignette demonstrates a typical workflow for QA/QC of such datasets.

First let's load in some handy packages

library(RCurl) #For loading in data directly from online repositories
library(tidyverse) #we're going to work in the tidyverse for data manipulation and plotting
library(cowplot)

Read in some data (this is Beaufort Lagoon Ecosystem LTER data from 2018-2019)

urlCSV <- getURL("https://portal-s.edirepository.org/nis/dataviewer?packageid=knb-lter-ble.3.7&entityid=534373640a661d8ed0bfedc52479a133", timeout = 200)
txtCSV <- textConnection(urlCSV)
tempsal_data <- read.csv(txtCSV, stringsAsFactors = F) %>% 
  filter(station=="KALD1") %>% 
  mutate(date_time=as.POSIXct(date_time, format="%m/%d/%Y %H:%M"))
close(txtCSV)

head(tempsal_data)
str(tempsal_data)

Simple timeseries plots

ggplot(tempsal_data, aes(date_time, temperature))+
  geom_line()

ggplot(tempsal_data, aes(date_time, conductivity))+
  geom_line()

Some of this conductivity data is clearly bad (look at that June-July weirdness)! We need to calibrate the data and remove "bad" points.

Calibrate the data based on readings from a more recently-calibrated sensor

Read in data from YSI sonde measurements after deployment and just before retrieval of the CTD

urlCSV <- getURL("https://portal-s.edirepository.org/nis/dataviewer?packageid=knb-lter-ble.3.7&entityid=6d327300d4893e4f932220fecf49c507", timeout = 200)
txtCSV <- textConnection(urlCSV)
ysi<- read.csv(txtCSV, stringsAsFactors = F) %>% 
  filter(station=="KALD1") %>% 
  mutate(date_time=as.POSIXct(substr(date_time,1, 16), format="%Y-%m-%dT%H:%M"))#datetime is in ISO 8601, but inports as character
close(txtCSV)

head(ysi)

Calibrate the data based on linear relationship between these two "anchor" points. Note that this will also work with >2 anchor points, for which the adjustment will be based on the linear relationship between each sequential pair of anchor points. If data exists before or after anchor points, the linear relationship will use the first (in the before case), or last (in the after case) reading from the raw data.

tempsal_data_calibrated <- calibrate_data(tempsal_data, ysi, raw= "temperature", cal_by = "temperature_C")

tempsal_data_calibrated  <- calibrate_data(tempsal_data_calibrated, ysi, raw= "conductivity", cal_by = "conductivity_uS_cm")

head(tempsal_data_calibrated)

ggplot(tempsal_data_calibrated, aes(date_time, temperature))+
  geom_line()+
  geom_line(aes(date_time, temperature_calibrated), color="blue")

ggplot(tempsal_data_calibrated, aes(date_time, conductivity))+
  geom_line()+
  geom_line(aes(date_time, conductivity_calibrated), color="blue")

Ok, now that the data is calibrated, let's calculate and plot salinity. We're going to to this the "tidyverse" way

tempsal_data_calibrated <- tempsal_data_calibrated %>%
  mutate(salinity_calibrated=calculate_salinity(conductivity_calibrated, temperature_calibrated))

ggplot(tempsal_data_calibrated, aes(date_time, salinity_calibrated))+
  geom_line()

Clearly some of that data is still bad, so let's flag it...

Identify "bad" data based on the freezing line from temperature/salinity plots

Temperature/salinity plots can easily tell you when something is amiss in your data. This often occurs because the conductivity reading if off due to obstruction by sediments or fouling (for instruments with a conductivity cup), or by nearby ice or other conductive materials (for RBR "donuts"). In cold regions, data below the freezing line is expecially suspect.

plot_tempsal(tempsal_data_calibrated, "temperature_calibrated", "salinity_calibrated", "date_time", plottitle="example")

The straight lines for points in the temp vs sal plot indicate intstrument error. We can flag the points below the freezing line programatically. This function creates a column anomalous to indicated flagged data

tempsal_data_flagged <- flag_salinity(tempsal_data_calibrated, tempcol="temperature_calibrated", condcol="conductivity_calibrated",Terror=.1, Cerror = 2, flag_scheme = c("VALID","INV"))

head(tempsal_data_flagged)

ggplot(tempsal_data_flagged, aes(date_time, salinity_calibrated))+
  geom_point(aes(color=anomalous))

Clearly, this doesn't take care of all the suspicous data.

Easily manually identify "bad" data that doesn't get picked up from the freezing line algorithm

This function will open up an external window where you can manually click on suspicious points. Click "Stop" in the upper left of the pop-up when you are done.

bad_points <- id_outlier(tempsal_data_calibrated, datecol="date_time", tempcol="temperature_calibrated", salcol="salinity_calibrated")

bad_points

Note that you can incorporate this output into the flag column of your dataframe via indexing

tempsal_data_flagged$anomalous[bad_points] <- "Questionable"

ggplot(tempsal_data_flagged, aes(date_time, salinity_calibrated))+
  geom_point(aes(color=anomalous))


BLE-LTER/insitu documentation built on Feb. 5, 2021, 5:28 p.m.