#' @title Create Seasonally Detrended Salinty Data Set
#'
#' @description This function creates a seasonally detrended salinity data set
#' for selected stations. The created data set is used to support application
#' of GAMs that include a hydrologic term as one of the independent variables.
#' The output from this function should be stored as an .rda file for repeated
#' use with baytrends.
#'
#' @param df.sal data frame with salinty data (required variables in data frame
#' are: station, date, layer, and salinity)
#' @param dvAvgWinSel Averaging window (days) selection for pooling data to
#' compute summary statistics
#' @param lowess.f lowess smoother span applied to computed standard deviation
#' (see Details). This gives the proportion of points which influence the
#' smooth at each value. Larger values give more smoothness.
#' @param minObs Minimum number of observations for performing analysis (default
#' is 40)
#' @param minObs.sd Minimum number of observations in averaging window for
#' calculation of the standard deviation (default is 10)
#'
#' @return Returns a list of seasonally detrended salinity data. You should save
#' the resulting list as salinity.detrended for use with baytrends. This
#' function also creates diagnostic plots that can be saved to a report when
#' this function is called from an .Rmd script.
#'
#' @details This function returns a list of seasonally detrended salinity and
#' companion statistics; and relies on a user supplied data frame that
#' contains the following variables: station, date, layer, and salinity. See
#' structure of sal data in example below.
#'
#' It is the user responsibility to save the resulting list as
#' \bold{salinity.detrended} for integration with baytrends.
#'
#' For the purposes of baytrends, it is expected that the user would identify
#' a data set with all salinity data that are expected to be evaluated so that
#' a single data file is created. The following computation steps are
#' performed:
#'
#' 1) Extract the list of stations, minimum year, and maximum year in data
#' set. Initialize the \bold{salinity.detrended} list with this information
#' along with meta data documenting the retrieval parameters.
#'
#' 2) Downselect the input data frame to only include data where the
#' layer is equal to 'S', 'AP', 'BP' or 'B'.
#'
#' 3) Average the 'S' and 'AP' salinity data; and the 'B' and 'BP salinity
#' data together to create average salinity values for SAP (surface and above
#' pycnocline) and BBP (bottom and below pycnocline), respectively. These
#' values are stored as the variables, \bold{salinity.SAP} and
#' \bold{salinity.BBP} together with the \bold{date} and day of year
#' (\bold{doy}) in a data frame corresponding to the station ID.
#'
#' 4) For each station/layer combination with atleast \bold{minObs}
#' observations, a seasonal GAM, i.e., gamoutput <- gam(salinity ~ s(doy,
#' bs='cc')) is evaluated and the predicted values stored in the above data
#' frame as \bold{salinity.SAP.gam} and \bold{salinity.BBP.gam}.
#'
#' 5) The GAM residuals, i.e., "residuals(gamoutput)" are extracted and stored
#' as the variable, \bold{SAP} or \bold{BBP} in the above data frame.
#' (These are the values that are used for GAMs that include salinity.)
#'
#' 6) After the above data frame is created and appended to the
#' list \bold{salinity.detrended}, the following four (4) additional
#' data frames are created for each station.
#'
#' \bold{mean} -- For each doy (i.e., 366 days of year), the mean across all
#' years for each value of d. Since samples are not collected on a daily basis
#' it is necessary to aggregate data from within a +/- one-half of
#' \bold{dvAvgWinSel}-day window around d. (This includes wrapping around the
#' calendar year. That is, the values near the beginning of the year, say
#' January 2, would include values from the last part of December and the
#' first part of January. The variables in the mean data frame are doy, SAP,
#' and BBP.
#'
#' \bold{sd} -- For each doy (i.e., 366 days of year), the standard deviation
#' across all years for each value of d. (See mean calculations for additional
#' details.)
#'
#' \bold{nobs} -- For each doy (i.e., 366 days of year), the number of
#' observations across all years for each value of d. (See mean calculations
#' for additional details.)
#'
#' \bold{lowess.sd} -- Lowess smoothed standard deviations. It is noted that
#' some stations do not include regular sampling in all months of the year or
#' for other reasons have few observations from which to compute standard
#' deviations. Through visual inspection of plots, we found that the standard
#' deviation could become unstable when the number of observations is small.
#' For this reason, when the number of observations is less than
#' \bold{minObs.sd}, the corresponding value of lowess.sd is removed and
#' interpolated from the remaining observations.
#'
#' The above four data frames (mean, sd, nobs, and lowess.sd) are created,
#' they are added to a list using a \bold{station.sum} naming convention and
#' appended to the list \bold{salinity.detrended}.
#'
#' @importFrom utils modifyList
#' @importFrom stats aggregate
#' @importFrom stats na.pass
#' @importFrom stats residuals
#' @importFrom stats sd
#' @importFrom stats lowess
#' @examples
#' \dontrun{
#' # Show Example Dataset (sal)
#' str(sal)
#'
#' # Define Function Inputs
#' df.sal <- sal
#' dvAvgWinSel <- 30
#' lowess.f <- 0.2
#' minObs <- 40
#' minObs.sd <- 10
#'
#' # Run Function
#' salinity.detrended <- detrended.salinity(df.sal, dvAvgWinSel,
#' lowess.f, minObs, minObs.sd)
#' }
#' @export
detrended.salinity <- function(df.sal, dvAvgWinSel=30, lowess.f=0.2,
minObs=40, minObs.sd=10) { ##FUNCTION.START
# Initialize settings and load input salinity data ----
{
# * set up flow retrieval and seasonal adjustment parameters ----
salinity.detrended <- list(analysisDate = Sys.time(),
stations = unique(df.sal$station),
yearStart = year(min(df.sal$date)),
yearEnd = year(max(df.sal$date)),
dvAvgWinSel = dvAvgWinSel,
lowess.f = lowess.f,
minObs = 40,
minObs.sd = 10 )
}
# Create SAP and BBP average salinity ----
{
df <- df.sal
# ** create SAP and BBP df's ----
df.SAP <- df[df$layer %in% c("S","AP"),]
df.BBP <- df[df$layer %in% c("B","BP"),]
# ** average salinity ----
df.SAP <- aggregate(salinity ~ station + date,
data=df.SAP, FUN=mean, na.action=na.pass, na.rm=TRUE)
df.BBP <- aggregate(salinity ~ station + date,
data=df.BBP, FUN=mean, na.action=na.pass, na.rm=TRUE)
df.SAP$layer <- 'SAP'
df.BBP$layer <- 'BBP'
# ** combine averaged data ----
salinity<-rbind(df.SAP,df.BBP)
rm(df.BBP,df.SAP, df.sal)
}
# station by station processing ----
for (i.station in salinity.detrended[["stations"]] ) {
# * Average salinity data ----
# ** set up variable names for raw and summary data ----
var <- i.station
var.sum <- paste0(i.station, ".sum")
# ** create df of SAP and BBP salinity for ith station ----
df <- salinity[salinity$station == i.station,]
df <- reshape (df, v.names=c("salinity"), idvar=c("station", "date"),
timevar=c("layer"), drop=c(""), direction = "wide")
attr(df,'reshapeWide') <-NULL
# ** test for presense of at least minObs obs ----
hasSAP <- if("salinity.SAP" %in% names(df)) TRUE else FALSE
if(hasSAP) {
hasSAP <- if(sum(!is.na(df[,'salinity.SAP'])) >= salinity.detrended$minObs) TRUE else FALSE
}
hasBBP <- if("salinity.BBP" %in% names(df)) TRUE else FALSE
if(hasBBP) {
hasBBP <- if(sum(!is.na(df[,'salinity.BBP'])) >= salinity.detrended$minObs) TRUE else FALSE
}
print(paste0(var,' (Surface: ',hasSAP,' / Bottom: ',hasBBP,")"))
# ** break out if nobs < minObs ----
if(!hasBBP & !hasSAP) {
tmp.list <- list( data.frame(NA))
names(tmp.list) <- var
salinity.detrended <- modifyList(salinity.detrended, tmp.list)
names(tmp.list) <- var.sum
salinity.detrended <- modifyList(salinity.detrended, tmp.list)
next
}
# * Perform gam modeling ----
# ** add doy; sort column order ----
df$doy <- as.numeric(baytrends::baseDay(df$date))
df<-df[,c('station', 'date', 'doy', setdiff(names(df),
c('station', 'date','doy')))]
# ** SAP gam model ----
if(hasSAP) {
# compute salinity gam models
gamSAP <- mgcv::gam(df[,"salinity.SAP"] ~ s(df[,"doy"],bs='cc'))
# compute/store predicted and residual salinities
df[!is.na(df$salinity.SAP),"salinity.SAP.gam"] <- predict(gamSAP)
df[!is.na(df$salinity.SAP),"SAP"] <- residuals(gamSAP)
} else {
df[,"salinity.SAP.gam"] <- NA_real_
df[,"SAP"] <- NA_real_
}
# ** BBP gam model ----
if(hasBBP) {
# compute salinity gam models
gamBBP <- mgcv::gam(df[,"salinity.BBP"] ~ s(df[,"doy"],bs='cc'))
# compute/store predicted and residual salinities
df[!is.na(df$salinity.BBP),"salinity.BBP.gam"] <- predict(gamBBP)
df[!is.na(df$salinity.BBP),"BBP"] <- residuals(gamBBP)
} else {
df[,"salinity.BBP.gam"] <- NA_real_
df[,"BBP"] <- NA_real_
}
# ** put seasonal averages into a list ----
# set embedded df to correspond to station id
# append list to overall list
tmp.list <- list(df[, !names(df) %in% c("station")])
names(tmp.list) <- var
salinity.detrended <- modifyList(salinity.detrended, tmp.list)
# * Compute doy based means and stdv ----
# ** create artificial data set ----
# append a +/- 1-year offset to allow for wrap around on data average window
df.pos <- df.neg <- df
df.neg$doy <- df.neg$doy - 366
df.pos$doy <- df.pos$doy + 366
df <- rbind( df.neg, df, df.pos); remove(df.neg, df.pos)
# ** initialize data frames for mean, sd, and nobs ----
df.mean <- data.frame(doy = as.numeric(c((1-366):(366+366))
, SAP=NA_real_, BBP=NA_real_))
df.sd <- data.frame(doy = as.numeric(c((1-366):(366+366))
, SAP=NA_real_, BBP=NA_real_))
df.nobs <- data.frame(doy = as.numeric(c((1-366):(366+366))
, SAP=NA_real_, BBP=NA_real_))
# ** compute mean, sd, and obs for doy from 1:366 ----
window <- salinity.detrended[["dvAvgWinSel"]]/2
for (j in 1:1098) {
j_day <- df.mean[j,"doy"]
tmp <- (df[df$doy >= j_day-window & df$doy <= j_day+window, ])
df.mean[j,"SAP"] <- mean (tmp$SAP, na.rm=TRUE)
df.sd [j,"SAP"] <- sd (tmp$SAP, na.rm=TRUE)
df.nobs[j,"SAP"] <- sum(!is.na(tmp$SAP))
df.mean[j,"BBP"] <- mean (tmp$BBP, na.rm=TRUE)
df.sd [j,"BBP"] <- sd (tmp$BBP, na.rm=TRUE)
df.nobs[j,"BBP"] <- sum(!is.na(tmp$BBP))
}
# ** compute lowess smooth ----
df.lowess.sd <- as.data.frame(lowess(x=df.sd[,"doy"],
y=df.sd[,"SAP"] ,
f= salinity.detrended$lowess.f))
names(df.lowess.sd) <- c("doy","SAP")
tmp <- as.data.frame(lowess(x=df.sd[,"doy"],
y=df.sd[,"BBP"] ,
f= salinity.detrended$lowess.f))
names(tmp) <- c("doy","BBP")
df.lowess.sd <- merge(df.lowess.sd, tmp, by='doy')
# ** set lowess.sd to NA where nobs <= salinity.detrended$minObs.sd, then
# use fillMissing to interpolate
df.lowess.sd[df.nobs$SAP<=salinity.detrended$minObs.sd,"SAP"] <- NA
df.lowess.sd$SAP <- fillMissing(df.lowess.sd$SAP, max.fill=9e9, span=1)
df.lowess.sd[df.nobs$SAP<=salinity.detrended$minObs.sd,"BBP"] <- NA
df.lowess.sd$BBP <- fillMissing(df.lowess.sd$BBP, max.fill=9e9, span=1)
# now trim down to doy from 1:366
df.mean <- df.mean[df.mean$doy %in% 1:366, ]
df.sd <- df.sd [df.sd$doy %in% 1:366, ]
df.nobs <- df.nobs[df.nobs$doy %in% 1:366, ]
df.lowess.sd <- df.lowess.sd [df.lowess.sd$doy %in% 1:366, ]
# ** put mean, sd, nobs, & lowess by doy into a list ----
# set embedded df to correspond to station.sum
# append list to overall list
tmp.list <- list(list(mean = df.mean, sd = df.sd, nobs = df.nobs
, lowess.sd = df.lowess.sd))
names(tmp.list) <- var.sum
salinity.detrended <- modifyList(salinity.detrended, tmp.list)
}
# Post processing check ----
# are the list of stations in salinity.detrended the same as what i stored in
# salinity.detrended$stations
names(salinity.detrended)[names(salinity.detrended) %in%
salinity.detrended[["stations"]] ] ==
salinity.detrended$stations
# Plot results ----
# plot mean, sd (and lowess.sd), and nobs per doy
# QC, 20180502
# figNum not defined before used.
if(!exists("figNum")){
figNum<-0
}
figNum <<- 0
for (station in salinity.detrended[["stations"]]) {
station.sum <- paste0 (station ,".sum" )
station.mean <- salinity.detrended[[station.sum]][["mean"]][["SAP"]]
station.sd <- salinity.detrended[[station.sum]][["sd"]][["SAP"]]
station.doy <- salinity.detrended[[station.sum]][["sd"]][["doy"]]
station.lowess.sd <- salinity.detrended[[station.sum]][["lowess.sd"]][["SAP"]]
station.nobs <- salinity.detrended[[station.sum]][["nobs"]][["SAP"]]
if(is.null(station.mean)) next
# plots to screen
par(mfrow = c(3, 1))
plot(station.doy, station.mean , ylim=c(-3.,3.),
xlab=NA, ylab="mean"); title(paste0(station,"--SAP"))
plot(station.doy, station.sd , ylim=c(0,6), col='grey',
xlab=NA, ylab="sd"); #title(station);
lines(station.doy, station.lowess.sd, col='red',lwd=2)
plot(station.doy, station.nobs , ylim=c(0,80) ,
xlab='Day of Year', ylab="Nobs."); #title(station);
#
figNum <<- figNum + 1
title <- paste0( "SAP mean, standard deviation, and number of observations ",
" as a Function of Day of Year.")
.F(title, figNum)
}
for (station in salinity.detrended[["stations"]]) {
station.sum <- paste0 (station ,".sum" )
station.mean <- salinity.detrended[[station.sum]][["mean"]][["BBP"]]
station.sd <- salinity.detrended[[station.sum]][["sd"]][["BBP"]]
station.doy <- salinity.detrended[[station.sum]][["sd"]][["doy"]]
station.lowess.sd <- salinity.detrended[[station.sum]][["lowess.sd"]][["BBP"]]
station.nobs <- salinity.detrended[[station.sum]][["nobs"]][["BBP"]]
if(is.null(station.mean)) next
# plots to screen
par(mfrow = c(3, 1))
plot(station.doy, station.mean , ylim=c(-3.,3.),
xlab=NA, ylab="mean"); title(paste0(station,"--BBP"))
plot(station.doy, station.sd , ylim=c(0,6), col='grey',
xlab=NA, ylab="sd"); #title(station);
lines(station.doy, station.lowess.sd, col='red',lwd=2)
plot(station.doy, station.nobs , ylim=c(0,80) ,
xlab='Day of Year', ylab="Nobs."); #title(station);
#
figNum <<- figNum + 1
title <- paste0( "BBP mean, standard deviation, and number of observations ",
"as a Function of Day of Year.")
.F(title, figNum)
}
return(salinity.detrended )
} ##FUNCTIOIN.END
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.