#' Calculate Long Term Estimate of SLR
#'
#' Given an series of (typically autocorrelated) observations of sea level or
#' mean sea level, and associated time coordinates (usually Dates), return a
#' named vector containing a summary of a generalized least squares linear trend,
#' optionally scaled to annual values. This function mimics the calculation of
#' long term estimates of sea level rise and associated uncertainties conducted
#' by NOAA. Where we have checked, results of this function, when based on
#' NOAA's seasonally adjusted mean trend data (See Details), agree with NOAA's
#' trend estimates within rounding error.
#'
#' @details
#' Sea level data tends to be highly autocorrelated over multiple time
#' scales. This function uses generalized least squares (GLS) to fit a linear
#' model with a correlation structure defined by fitting an AR1 process to model
#' residuals.
#'
#' Slope estimates from the AR1 models tend (in this context) to be
#' indistinguishable from estimates generated by ordinary least squares, but the
#' uncertainty of the estimates is substantially greater. As observations are
#' not independent, ordinary least squares (which assumes independence) will
#' overstate model precision.
#'
#' The AR1 process, an "autoregressive process of order 1", is a simple
#' assumption about correlation structures that reduces the risk of overstating
#' the precision of trend estimates. In an AR1 model, each observation is
#' regressed against the prior observation. Informally, An AR1 process
#' generates a correlation structure based on the lag 1 autocorrelation.
#' Modeled autocorelations between observations are thus proportional to the
#' lag 1 autocorrelation raised to the power of the number of timesteps between
#' the two observations.
#'
#' The AR1 structure, however, may not be ideal for all tidal time series.
#' Actual tidal data series tend to have more complex autocorrelation structure.
#' Where regular seasonal patterns are evident, explicit modeling is probably
#' called for. Thus the function is principally intended for use with data
#' averaged over monthly or longer periods.
#'
#' This function is a convenience wrapper around `nlme::gls(..., correlation =
#' corAR1())`. For more control, or to examine model diagnostics, the user
#' should fit the generalized least squares model directly.
#'
#' The function does not fit periodic terms. Thus it is generally
#' not suitable for use with "raw" sea level observations, especially those
#' collected at high resolution, like NOAA hourly data. Tidal data contains many
#' periodic components (~ 24:50 tidal period, spring tide / neap tide cycles,
#' annual cycles, etc.). Those periodic structures will affect estimates of
#' autocorrelation, ultimately making estimates of model uncertainty less
#' reliable. If working from high frequency data, short-term structure should be
#' removed before using this function, typically by calculating average sea
#' level over convenient and meaningful time periods, like months or years.
#'
#' The function's original use case was based on secondary analysis of NOAA
#' "mean trend" data. NOAA calculates long-term sea level rise estimates based
#' on a modified monthly water level data series. The source data is not the
#' same as what one gets if one downloads monthly mean water level through the
#' standard NOAA API. NOAA's "mean trend" data contains "monthly mean sea level
#' without the regular seasonal fluctuations due to coastal ocean temperatures,
#' salinities, winds, atmospheric pressures, and ocean currents. The seasonally
#' detrended data is available from NOAA for manual download from water level
#' station web pages, but apparently not through the API. The `prov_meantrend`
#' data included in `SLRSIM` is an example, for Providence, Rhode Island.
#'
#' `SLRSIM` provides a a function, `get_slr_trend()`, to access NOAA's long term
#' trend analysis result directly, including error terms and period of record.
#'
#' If `t_fit = FALSE`, the function fits an autocorrelation structure to
#' the sequential observations, ordered by `.dt`, without regard to how much
#' time has passed between subsequent observations. This approach works well
#' with evenly spaced dates and complete or nearly complete records, but is
#' less successful if there are many periods with missing observations.
#'
#' If `t_fit = TRUE`, the function fits an autocorrelation structure based on
#' the time coordinate (.dt). In this case, the time coordinate must be integer
#' valued. R's Date class is an integer under the hood. It is easy to build
#' Dates, either by converting POSIX times with `as.Date()` or by building
#' them up from month, day, and year information with
#' `as.Date(paste(year, month, 15, sep = '-')`.
#'
#' By default, analysis is conducted by years (`.mode == 'year'`). The slope
#' estimate and standard error are scaled from days (based on R Dates) or
#' seconds (based on POSXIt objects) to (approximate) annual values by
#' multiplying by 365.242 2 (days / Dates) or 31556926 (seconds / POSIXt).
#' @param .data Source data frame for data. Use NULL if no data frame is used
#' and all data is passed from
#' the enclosing environment.
#' @param .sl The data variable (usually found in the data frame) showing sea
#' level or mean sea level. Must be a named variable, not an
#' expression.
#' @param .dt Data variable containing corresponding midpoint dates for the
#' period of averaging used to calculate .sl. Must be a named variable,
#' not an expression. Midpoint dates for a given month can be approximated
#' with `as.Date(paste(year, month, 15, sep = '-')`.
#' @param .ci P value for two sided confidence intervals for the estimated
#' slope, based on a (possibly naive) normal approximation.
#' @param t_fit Should the underlying generalized linear model's correlation
#' structure be fit based on the time coordinate (.dt), or only on the
#' sequence of observations in the data? Setting this to TRUE is safer if
#' you are uncertain of the sequence of observations in the source data,
#' or significant missing values in the data, but it significantly slows
#' model fitting. Results tend to be very similar for complete or near
#' complete data.
#' @param .mode one of c('year', 'unscaled') indicating whether the
#' results should be scaled to years (the default) or left unscaled.
#' Other sea level rate functions accept other `.mode` values.
#' Scaling requires the time coordinate to inherit from R `Date` or
#' `POSIXt` objects. Rates based on `Dates` are converted to annual
#' values by multiplying by 365.2422 days / year. Rates based on
#' `POSIXt` objects are converted to annual values by multiplying slope
#' estimates by multiplying by 31556926 seconds / year (with a warning
#' about rounding error).
#' @return
#' A named vector, with the following components and attributes.
#' \describe{__Vector Components__
#' \item{Estimate}{The trend. Units depend on value of the `.mode` argument.
#' For `.mode == 'year`, units are per year, otherwise per unit of the .dt
#' variable.}
#' \item{Std_Err}{Standard Error of the trend. Units as above.}
#' \item{P_Val}{P value of the trend.}
#' \item{Lower_CI}{Lower confidence interval for the estimate.}
#' \item{Upper_CI}{Upper confidence interval for teh estimate.}
#' }
#' \describe{__Attributes__
#' \item{CI_P}{Nominal p value used to generate the confidence intervals.}
#' \item{span}{How many calander years are represented in the data?}
#' \item{start}{First year included in the data.}
#' \item{stop}{last year included in the data.}
#' \item{cor_struct}{Were parameters fit based on order or time }
#'}
#'
#'@family sea level rate functions
#' @export
#'
#' @examples
#'
#' prov_meantrend$MSL_mm <- prov_meantrend$MSL * 1000 (convert to mm)
#'
#' # Basic Usage
#' slr_slope(prov_meantrend, MSL_mm, MidDate)
#'
#' # if not passed dates, the result is unscaled
#' slr_slope(prov_meantrend, MS_mm, unclass(MidDate))
#'
#' # One can also fit the model to the time coordinate explicitly
#' slr_slope(prov_meantrend, MSL_mm, MidDate, t_fit = TRUE)
#'
slr_slope <- function(.data, .sl, .dt,
.ci = 0.95, t_fit = FALSE,
.mode = c('year', 'unscaled')) {
# Ugly argument checks, since they doesn't provide nice error messages.
stopifnot(is.data.frame(.data) | is.null(.data))
stopifnot(inherits(t_fit, 'logical'))
# We want to be able to accept arguments as unquoted names or quoted names.
# `ensym()` captures only names, not expressions
sl_sym <- rlang::enquo(.sl)
date_sym<- rlang::enquo(.dt)
sl <- rlang::eval_tidy(sl_sym, .data)
the_date <- rlang::eval_tidy(date_sym, .data)
.mode = match.arg(.mode)
have_dates <- inherits(the_date, c('Date','POSIXt'))
if(.mode == 'year' & ! have_dates) {
stop('.mode == "year" requires .dt to be a Date or POSIXct time.\n',
'You can convert integer years to dates with ',
'`as.Date(paste(year, "06", "15"), sep = "-")`')
}
# If we got dates or times, and `.mode == 'year'`, we want to scale to years,
# otherwise we apply no scaling, and leave that to the user.
if (.mode == 'year'&& inherits(the_date, 'Date')) {
multiplier <- 365.2422
}
else if (.mode == 'year' && inherits(the_date, 'POSIXct')) {
# 365 days, 5 hours, 48 minutes, and 46 seconds
multiplier <- 31556926 #seconds per year, on average
message("Annual trends based on POSIXct times may be affected by ",
"rounding. Consider rexpressing time coordinates as R Dates.")
}
else {
multiplier <- 1
}
# Reorder the data by the time stamp. Only essential if t_fit is FALSE
sl <- sl[order(the_date)]
the_date <- the_date[order(the_date)]
year = as.numeric(format(the_date, format = '%Y'))
yearspan = max(year) - min(year) + 1
q_val = - qnorm(.ci/2)
# We use generalized least squares so we can account for
# autocorrelation in estimating the error of the slope.
# The simpler model is for a complete or nearly complete time series. The
# model based on the time coordinates is slower, and sometimes
# crashes on large data sets.
if (t_fit) {
the_gls <- nlme::gls(sl ~ the_date,
correlation = nlme::corAR1(0.75, form = ~the_date))
}
else {
the_gls <- nlme::gls(sl ~ the_date,
correlation = nlme::corAR1())
}
# TODO: Use direct computation rather than relying on `summary()$tTable`
# everything except the p value can be accessed via `coef()` and `vcov()`.
mod_sum <- as.data.frame(summary(the_gls)$tTable)
EST <- mod_sum$Value[2] * multiplier # convert to year
SE <- mod_sum$Std.Error[2] * multiplier
P <- mod_sum$`p-value`[2]
low_CI <- EST - q_val*SE
high_CI<- EST + q_val*SE
cor_struct = c('Order-based', 'Time-based')[t_fit + 1]
result <- structure(c(Estimate = EST, Std_Err = SE, P_Val = P,
Lower_CI = low_CI, Upper_CI = high_CI), CI_P = .ci,
span = yearspan, start = min(year), end = max(year),
cor_struct = cor_struct)
return(result)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.