linearTrend | R Documentation |
Computes linear trend slopes, their confidence intervals and some other related statistics for timeseries using Santer et al. (2008) method modified for dealing with missing data in time series.
linearTrend(grid, p = 0.9)
grid |
A climate4R predictor dataset |
p |
A numeric value. Confidence level for the uncertainty interval (0<p<1). |
## Uses a model y = a + b*x + e, where e is assumed to be an AR(1) process. Applies ordinary least squares (OLS) regression. Estimates the trend slope's standard error, confidence interval, and p-value using an AR(1)-based reduction in the number of degrees of freedom (DOF), as described by Santer et al. (2008) paper in Int. J. of Climatology (S2008 hereinafter). S2008 algorithm is modified here for the use with time series where some data points are missing and also is augmented here by provisions for two special cases: when the lag-1 autoregression coefficient is negative (0 is used) and when reduced DOF number falls below 3 or cannot be estimated at all ("irregularity" code returned for some parameters). Algorithm modification to accomodate the presence of missing data is as follows: (a) the slope and intercept of the regression line are computed by applying the OLS only to the subset of points where data values are not missing; (b) while S2008 compute the lag-1 autocorrelation coefficient of data residuals e as a sample correlation coefficient between e(1:N-1) and e(2:N), here it is computed for e(ind) and e(ind+1), where ind is the subset of indices i in 1:N such that both e(i) and e(i+1) are available.
Note that the same formula for reduced number of degrees of freedom (DOFr) is used as in S2008; in the presence of missing data this formula underestimates DOFr and thus results in wider (more conservative) confidence intervals and larger p-values (compared to the cases without missing data).
## Please note: (1) the time grid is expected to be uniform, and (2) missing data values contain NaN (3) if this function is called with 2 input parameters only, p is assumed to be NaN, and NaN is returned for cinthw; other output parameters are unaffected (4) this function uses an external function mklr.m for the OLS regression
Input: x - vector of time values (a uniform grid) y - vector of data values (NaN in place of missing values) p - confidence level for the uncertainty interval (0<p<1)
Output: b - estimated slope of the linear trend cinthw - halfwidth of the confidence interval sig - estimated standard error in b DOFr - reduced number of DOF (a.k.a. effective sample size) rho - lag-1 autocorrelation coefficient of data residuals (w.r.t. trend line) pval - p-value of the estimated b in the two-sided Student's t test for the null hypothesis of no trend irrc - "irregularity" code: irrc = 0, regular application of the algorithm irrc = 1, rho < 0 (rho=0 value is used in sig and cinthw calculations, but unmodified rho value is returned) irrc = 10, DOFr < 3; in this case results of the calculation are not recommended for use; when DOFr–>2-0, values of sig and cinthw tend to infinity and pval–>1 (Inf are returned for sig and cinthw and 1 for pval when DOFr <= 2) irrc = 100, rho could not be estimated (NaN are returned for rho, Inf for sig and cinthw, and 1 for pval) irrc = 1000, no calculation is done b/c number of available data points Na < 3 N - timeseries length a - intercept of the trend line Na - number of available data points Nc - sample size for calculating rho
Detailed algorithm:
0) output parameters are assigned missing values, corresponding to the case of no calculation; set irrc=0; N and Na are determined; if Na<3, set irrc = 1000, and no calculation is done; otherwise, vector of available data ya (length Na) and corresponding vector xa are formed; 1) Do the OLS regression of timeseries y on time x for the subset of available values of y (ya) and corresponding subset of x (xa): ya=a+b*xa; returns estimated a, b, sb (= estimated standard error for b) 2) compute data residuals e=y-a-b*x; 3) compute rho (= lag-1 autocorrelation coefficient for residuals) as the correlation coefficient between e(ind) and e(ind+1), where ind is the subset of indices i in 1:N such that both e(i) and e(i+1) are available; Nc = length(ind); if rho cannot be determined (e.g., Nc < 2), set irrc = 100 and no further calculation is done. 4) provision for the negative rho: rhop = max(rho,0); if rho<0, irrc = 1; 5) compute reduced DOF in the sample (a.k.a. effective sample size): DOFr=Na*(1-rhop)/(1+rhop); 6) provision for DOFr < 3 : irrc = 10; 7) if DOFr>2, compute the following:
(a) final standard error estimate for b sig=sb*sqrt((N-2)/(DOFr-2))
(b) p-value for b in the two-sided Student's t test: pval = 2*(1-tcdf(abs(b)/sig,DOFr-2))
(c) half-width of the (p*100) cinthw=sig*tinv(0.5+p/2,DOFr-2), where tinv(alpha,n) is the (alpha*100) t distribution with n degrees of freedom.
Return a multigrid C4R object with the 'p', 'pval' and 'irrc' parameters stacked as variables.
J. Baño-Medina, S. Herrera
library(transformeR)
library(visualizeR)
require(climate4R.datasets)
# Regular Grid
x <- get(data("CFS_Iberia_tas"))
r <- linearTrend(x,p = 0.9)
b <- subsetGrid(r,var = "b")
spatialPlot(b, backdrop.theme = "coastline", main = "Estimated slope of the linear trend (CFS)")
# Irregular Grid
x <- get(data("VALUE_Iberia_tas")) # irregular Grid
r <- linearTrend(x,p = 0.9)
b <- subsetGrid(r,var = "b")
spatialPlot(b, backdrop.theme = "coastline", main = "Estimated slope of the linear trend (VALUE)")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.