Nothing
#' Make a climatology from a daily time series.
#'
#' This is the fully data.table-based version of \code{\link{ts2clm}}. The function creates
#' a daily climatology from a time series of daily temperatures using a
#' user-specified sliding window for the mean and threshold calculation, followed
#' by an optional moving average smoother as used by Hobday et al. (2016).
#'
#' @import Rcpp
#'
#' @importFrom data.table .N
#' @useDynLib heatwaveR
#'
#' @param data A data frame with at least two columns. In the default setting
#' (i.e. omitting the arguments \code{x} and \code{y}; see immediately below),
#' the data set is expected to have the headers \code{t} and \code{temp}. The
#' \code{t} column is a vector of dates of class \code{Date}, while \code{temp}
#' is the measured variable (by default it is assumed to be temperature). Any
#' additional columns are not used in calculations but will be correctly
#' carried over into the output climatology. Such additional columns may be
#' site names, longitudes, or latitudes (etc.), but note that they are not meant
#' to specify a grouping structure of the time series. As such, grouping structures
#' are not handled by the function in its current form, and a time series is
#' therefore assumed to be located at a discrete point in space such as one
#' 'pixel' of longitude × latitude as one might find in gridded data products.
#' @param x This column is expected to contain a vector of dates. If a column
#' headed \code{t} is present in the data frame, this argument may be omitted;
#' otherwise, specify the name of the column with dates here.
#' @param y This is a column containing the measurement variable. If the column
#' name differs from the default (i.e. \code{temp}), specify the name here.
#' @param climatologyPeriod Required. To this argument should be passed two values
#' (see example below). The first value should be the chosen date for the start of
#' the climatology period, and the second value the end date of said period. This
#' chosen period (preferably 30 years in length) is then used to calculate the
#' seasonal cycle and the extreme value threshold.
#' @param maxPadLength Specifies the maximum length of days over which to apply
#' linear interpolation (padding) across the missing values (specified as \code{NA})
#' in the measured variable; i.e., any consecutive blocks of NAs with length
#' greater than \code{maxPadLength} will be left as \code{NA}. The default is
#' \code{FALSE}. Set as an integer to interpolate. Setting \code{maxPadLength}
#' to \code{TRUE} will return an error.
#' @param windowHalfWidth Width of sliding window about day-of-year (to one
#' side of the center day-of-year) used for the pooling of values and
#' calculation of climatology and threshold percentile. Default is \code{5}
#' days, which gives a window width of 11 days centred on the 6th day of the
#' series of 11 days.
#' @param pctile Threshold percentile (\%) for detection of events (MHWs).
#' Default is \code{90}th percentile. Should the intent be to use these
#' threshold data for MCSs, set \code{pctile = 10} or some other low value.
#' @param smoothPercentile Boolean. Select whether to smooth the climatology
#' and threshold percentile time series with a moving average of
#' \code{smoothPercentileWidth}. The default is \code{TRUE}.
#' @param smoothPercentileWidth Full width of moving average window for smoothing
#' the climatology and threshold. The default is \code{31} days.
#' @param clmOnly Boolean. Choose to calculate and return only the climatologies.
#' The default is \code{FALSE}.
#' @param var Boolean. This argument has been introduced to allow the user to
#' choose if the variance of the seasonal signal per doy should be calculated.
#' The default of \code{FALSE} will prevent the calculation. Setting it to
#' \code{TRUE} might potentially increase the speed of calculations on gridded
#' data and increase the size of the output. The variance was initially introduced
#' as part of the standard output from Hobday et al. (2016), but few researchers
#' use it and so it is generally regarded now as unnecessary.
#' @param roundClm This argument allows the user to choose how many decimal places
#' the \code{seas} and \code{thresh} outputs will be rounded to. Default is 4. To
#' prevent rounding set \code{roundClm = FALSE}. This argument may only be given
#' numeric values or FALSE.
#' @param ... Allows unused arguments to pass through the functions.
#'
#' @details
#' \enumerate{
#' \item This function assumes that the input time series consists of continuous
#' daily values with few missing values. Time ranges which start and end
#' part-way through the calendar year are supported.
#' \item It is recommended that a period of at least 30 years is specified in
#' order to produce a climatology that smooths out any decadal thermal
#' periodicities that may be present. When calculated over at least 30 years of
#' data, such a climatology is called a 'climatological normal.' It is further
#' advised that full the start and end dates for the climatology period result
#' in full years, e.g. "1982-01-01" to "2011-12-31" or "1982-07-01" to
#' "2012-06-30"; if not, this may result in an unequal weighting of data
#' belonging with certain months within a time series. A daily climatology will
#' be created; that is, the climatology will be comprised of one mean
#' temperature for each day of the year (365 or 366 days, depending on how
#' leap years are dealt with), and the mean will be based on a sample size that
#' is a function of the length of time determined by the start and end values
#' given to \code{climatologyPeriod} and the width of the sliding window
#' specified in \code{windowHalfWidth}.
#' \item This function supports leap years. This is done by ignoring Feb 29s
#' for the initial calculation of the climatology and threshold. The values
#' for Feb 29 are then linearly interpolated from the values for Feb 28 and
#' Mar 1.
#' \item Previous versions of \code{ts2clm()} tested to see if some rows are
#' duplicated, or if replicate temperature readings are present per day, but
#' this has now been disabled. Should the user be concerned about such repeated
#' measurements, we suggest that the necessary checks and fixes are implemented
#' prior to feeding the time series to \code{ts2clm()}.
#' }
#' The original Python algorithm was written by Eric Oliver, Institute for
#' Marine and Antarctic Studies, University of Tasmania, Feb 2015, and is
#' documented by Hobday et al. (2016).
#'
#' @return The function will return a data.table (see the \code{data.table}) with the
#' input time series and the newly calculated climatology. The climatology contains
#' the daily climatology and the threshold for calculating MHWs. The software was
#' designed for creating climatologies of daily temperatures, and the units
#' specified below reflect that intended purpose. However, various other kinds
#' of climatologies may be created, and if that is the case, the appropriate
#' units need to be determined by the user.
#' \item{doy}{Julian day (day-of-year) returned when \code{clmOnly = TRUE}. For non-leap
#' years it runs 1...59 and 61...366, while leap years run 1...366.}
#' \item{t}{The date vector in the original time series supplied in \code{data}. If
#' an alternate column was provided to the \code{x} argument, that name will rather
#' be used for this column.}
#' \item{temp}{The measurement vector as per the the original \code{data} supplied
#' to the function. If a different column was given to the \code{y} argument that
#' will be shown here.}
#' \item{seas}{Daily climatological cycle [deg. C].}
#' \item{thresh}{Daily varying threshold (e.g., 90th
#' percentile) [deg. C]. This is used in \code{\link{detect_event3}} for the
#' detection/calculation of events (MHWs).}
#' \item{var}{Daily varying variance (standard deviation) [deg. C]. This
#' column is not returned if \code{var = FALSE} (default).}
#' Should \code{clmOnly} be enabled, only the 365 or 366 day climatology will be
#' returned.
#'
#' @author Albertus J. Smit, Robert W. Schlegel, Eric C. J. Oliver
#'
#' @references Hobday, A.J. et al. (2016). A hierarchical approach to defining
#' marine heatwaves, Progress in Oceanography, 141, pp. 227-238,
#' doi:10.1016/j.pocean.2015.12.014
#'
#' @export
#'
#' @examples
#' data.table::setDTthreads(threads = 1) # optimise for your code and local computer
#' res <- ts2clm3(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"))
#' res[1:10, ]
#'
#' # Or if one only wants the 366 day climatology
#' res_clim <- ts2clm3(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"),
#' clmOnly = TRUE)
#' res_clim[1:10, ]
#'
#' # Or if one wants the variance column included in the results
#' res_var <- ts2clm3(sst_WA, climatologyPeriod = c("1983-01-01", "2012-12-31"),
#' var = TRUE)
#' res_var[1:10, ]
#'
ts2clm3 <- function(data,
x = t,
y = temp,
climatologyPeriod,
maxPadLength = FALSE,
windowHalfWidth = 5,
pctile = 90,
smoothPercentile = TRUE,
smoothPercentileWidth = 31,
clmOnly = FALSE,
var = FALSE,
roundClm = 4,
...) {
if (missing(climatologyPeriod))
stop("Oops! Please provide a period (two dates) for calculating the climatology.")
if (length(climatologyPeriod) != 2)
stop("Bummer! Please provide BOTH start and end dates for the climatology period.")
if (maxPadLength != FALSE & !is.numeric(maxPadLength))
stop("Please ensure that 'maxPadLength' is either FALSE or a numeric/integer value.")
if (!(is.numeric(pctile)))
stop("Please ensure that 'pctile' is a numeric/integer value.")
if (!(is.numeric(windowHalfWidth)))
stop("Please ensure that 'windowHalfWidth' is a numeric/integer value.")
if (!(is.logical(smoothPercentile)))
stop("Please ensure that 'smoothPercentile' is either TRUE or FALSE.")
if (!(is.numeric(smoothPercentileWidth)))
stop("Please ensure that 'smoothPercentileWidth' is a numeric/integer value.")
if (!(is.logical(clmOnly)))
stop("Please ensure that 'clmOnly' is either TRUE or FALSE.")
if (!(is.numeric(roundClm))) {
if (!roundClm == FALSE) {
stop("Please ensure that 'roundClm' is either a numeric value or FALSE.")
}
}
data.table::setDT(data)
clim_start <- fasttime::fastDate(climatologyPeriod[1])
clim_end <- fasttime::fastDate(climatologyPeriod[2])
mean_ts_y <- temp <- seas <- thresh <- .SD <- NULL
year <- doy <- ts_x <- NULL
seas_roll <- thresh_roll <- thresh_var_roll <- NULL
ts_x <- eval(substitute(x), data)
if (is.null(ts_x) | is.function(ts_x))
stop(
"Please ensure that a column named 't' is present in your data.table or that you have assigned a column to the 'x' argument."
)
ts_y <- eval(substitute(y), data)
if (is.null(ts_y) | is.function(ts_y))
stop(
"Please ensure that a column named 'temp' is present in your data.table or that you have assigned a column to the 'y' argument."
)
if (!inherits(ts_x[1], "Date"))
stop("Please ensure your date values are type 'Date'. This may be done with 'as.Date()'.")
if (!is.numeric(ts_y[1]))
stop("Please ensure the temperature values you are providing are type 'num' for numeric.")
ts_xy <-
data.table::data.table(ts_x = ts_x, ts_y = ts_y)[base::order(ts_x)]
rm(list = c("ts_x", "ts_y"))
# BEGIN INSERT make_whole_fast >>>
ts_full <- data.table::data.table(ts_x = seq.Date(ts_xy[1, ts_x],
ts_xy[.N, ts_x],
"day"))
feb28 <- 59
.is_leap_year <- function(year) {
return((year %% 4 == 0 & year %% 100 != 0) | (year %% 400 == 0))
}
ts_whole <-
data.table::merge.data.table(ts_full,
ts_xy,
by = "ts_x",
all.x = TRUE)[,
`:=`(year = as.integer(format(ts_x, "%Y")),
doy = as.integer(format(ts_x, "%j")))][,
doy := ifelse(!.is_leap_year(year) &
doy > 59, doy + 1, doy)][, year := NULL][, data.table::setcolorder(.SD, c("doy", "ts_x", "ts_y"))]
data.table::setkey(ts_whole, NULL)
rm(ts_full)
# END INSERT make_whole_fast <<<
# BEGIN INSERT na_interp >>>
if (sum(stats::complete.cases(ts_whole$ts_y)) < nrow(ts_whole) &
is.numeric(maxPadLength)) {
.na_pad <- function(x, fill, maxPadLength) {
if (maxPadLength <= 0)
return(x)
if (maxPadLength >= length(x))
return(fill)
naruns <- rle(is.na(x))
naruns$values[naruns$lengths > maxPadLength] <- FALSE
naok <- inverse.rle(naruns)
rm(naruns)
ifelse(naok, fill, x)
}
.na_fun <- function(x, y) {
na <- is.na(y)
yi <- stats::approx(x[!na], y[!na], x)$y
ygap <-
.na_pad(y, seq_along(y), maxPadLength = maxPadLength)
ix <- stats::approx(x, seq_along(y), x)$y
yx <-
ifelse(is.na(ygap[floor(ix)] + ygap[ceiling(ix)]), NA, yi)
yx
}
ts_whole[, ts_y := .na_fun(ts_x, ts_y)]
# END INSERT na_interp <<<
}
if (ts_whole$ts_x[1] > clim_start)
stop(
paste(
"The specified start date precedes the first day of series, which is",
ts_whole$ts_x[1]
)
)
if (clim_end > utils::tail(ts_whole$ts_x, 1))
stop(paste(
"The specified end date follows the last day of series, which is",
ts_whole$ts_x[nrow(ts_whole)]
))
if (clim_end - clim_start < 1095)
stop("The climatologyPeriod must be at least three years to calculate thresholds")
# BEGIN INSERT clim_spread >>>
.NA2mean <- function(x) {
z <- round(mean(x, na.rm = TRUE), 2)
x[is.na(x)] <- z
return(x)
}
ts_clim <- ts_whole[ts_x >= clim_start & ts_x <= clim_end,
list(ts_x = format(ts_x, "%Y"), doy, ts_y)]
ts_clim[, mean_ts_y := mean(ts_y, na.rm = TRUE), by = list(doy, ts_x)]
ts_spread <-
data.table::dcast(ts_clim, doy ~ ts_x, value.var = "mean_ts_y")
rm(ts_clim)
ts_spread_filled <-
data.table::data.table((sapply(ts_spread[59:61,],
function(x)
.NA2mean(x))))
ts_spread[60, (names(ts_spread)) := ts_spread_filled[2, .SD, .SDcols = names(ts_spread)]]
rm(ts_spread_filled)
l <- list(ts_spread[(.N - windowHalfWidth + 1):.N,],
ts_spread,
ts_spread[1:windowHalfWidth,])
ts_spread <- data.table::rbindlist(l)
rm(l)
len_yr <-
length(as.integer(substr(clim_start, 1, 4)):as.integer(substr(clim_end, 1, 4)))
ts_mat <- as.matrix(ts_spread)[, 2:(len_yr + 1)]
if (nrow(stats::na.omit(ts_mat)) < nrow(ts_mat)) {
plugs <- which(is.na(ts_mat), arr.ind = TRUE)
ts_mat[plugs] <- rowMeans(ts_mat, na.rm = TRUE)[plugs[, 1]]
}
# END INSERT clim_spread <<<
if (nrow(stats::na.omit(ts_mat)) < nrow(ts_mat) | var) {
ts_mat <- clim_calc(ts_mat, windowHalfWidth, pctile)
ts_mat[is.nan(ts_mat)] <- NA
} else {
ts_mat <- clim_calc_cpp(ts_mat, windowHalfWidth, pctile)
ts_mat <- data.table::data.table(ts_mat)
}
# BEGIN INSERT smooth_percentile >>>
if (smoothPercentile) {
prep <-
data.table::rbindlist(
list(
data.table::data.table(utils::tail(ts_mat, smoothPercentileWidth)),
data.table::data.table(ts_mat),
data.table::data.table(utils::head(ts_mat, smoothPercentileWidth))
)
)
prep[, `:=`(
doy = doy,
seas_roll = data.table::frollmean(
seas,
n = smoothPercentileWidth,
na.rm = FALSE,
align = "center"
),
thresh_roll = data.table::frollmean(
thresh,
n = smoothPercentileWidth,
na.rm = FALSE,
align = "center"
)
)]
ts_clim <-
prep[(smoothPercentileWidth + 1):(nrow(prep) - smoothPercentileWidth),
list(doy = doy,
seas = seas_roll,
thresh = thresh_roll)]
if (var) {
prep <-
data.table::rbindlist(
list(
data.table::data.table(utils::tail(ts_mat, smoothPercentileWidth)),
data.table::data.table(ts_mat),
data.table::data.table(utils::head(ts_mat, smoothPercentileWidth))
)
)
prep[, `:=`(
doy = doy,
seas_roll = data.table::frollmean(
seas,
n = smoothPercentileWidth,
na.rm = FALSE,
align = "center"
),
thresh_roll = data.table::frollmean(
thresh,
n = smoothPercentileWidth,
na.rm = FALSE,
align = "center"
),
thresh_var_roll = data.table::frollapply(
thresh,
n = smoothPercentileWidth,
var,
na.rm = FALSE,
align = "center"
)
)]
ts_clim <-
prep[(smoothPercentileWidth + 1):(nrow(prep) - smoothPercentileWidth),
list(
doy = doy,
seas = seas_roll,
thresh = thresh_roll,
var = thresh_var_roll
)]
}
rm(prep)
# END INSERT smooth_percentile <<<
} else {
ts_clim <- ts_mat
}
if (is.numeric(roundClm)) {
ts_clim[, (names(ts_clim)) := lapply(.SD, function(x)
round(x, digits = roundClm))]
}
rm(ts_mat)
if (clmOnly) {
return(ts_clim)
} else {
ts_res <-
ts_whole[ts_clim, on = list(doy), nomatch = 0][order(ts_x)]
rm(ts_whole)
rm(ts_clim)
names(ts_res)[2] <- paste(substitute(x))
names(ts_res)[3] <- paste(substitute(y))
# Implement a data.table method
if (ncol(data) > 2) {
merge_cols <- colnames(ts_res)[2:3]
ts_res <-
data.table::merge.data.table(data, ts_res, by = merge_cols, all = TRUE)
}
ts_res[, doy := NULL]
return(ts_res)
}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.