Nonparametric Test for Monotonic Trend Within Each Season Based on Kendall's Tau Statistic

Share:

Description

Perform a nonparametric test for a monotonic trend within each season based on Kendall's tau statistic, and optionally compute a confidence interval for the slope across all seasons.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
kendallSeasonalTrendTest(y, ...)

## S3 method for class 'formula'
kendallSeasonalTrendTest(y, data = NULL, subset, 
  na.action = na.pass, ...)

## Default S3 method:
kendallSeasonalTrendTest(y, season, year, 
  alternative = "two.sided", correct = TRUE, ci.slope = TRUE, conf.level = 0.95, 
  independent.obs = TRUE, data.name = NULL, season.name = NULL, year.name = NULL, 
  parent.of.data = NULL, subset.expression = NULL, ...)

## S3 method for class 'data.frame'
kendallSeasonalTrendTest(y, ...)

## S3 method for class 'matrix'
kendallSeasonalTrendTest(y, ...)

Arguments

y

an object containing data for the trend test. In the default method, the argument y must be numeric vector of observations. When y is a data frame, all columns must be numeric. When y is a matrix, it must be a numeric matrix. In the formula method, y must be a formula of the form y ~ season + year, where y, season, and year specify what variables to use for the these arguments in the call to kendallSeasonalTrendTest.default. Missing (NA), undefined (NaN), and infinite (Inf, -Inf) values are allowed but will be removed.

data

specifies an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which kendallTrendTest is called.

subset

specifies an optional vector specifying a subset of observations to be used.

na.action

specifies a function which indicates what should happen when the data contain NAs. The default is na.pass.

season

numeric or character vector or a factor indicating the seasons in which the observations in y were taken. The length of season must equal the length of y.

year

numeric vector indicating the years in which the observations in y were taken. The length of year must equal the length of y.

alternative

character string indicating the kind of alternative hypothesis. The possible values are "two.sided" (tau not equal to 0; the default), "less" (tau less than 0), and "greater" (tau greater than 0).

correct

logical scalar indicating whether to use the correction for continuity in computing the z-statistic that is based on the test statistic S'. The default value is TRUE.

ci.slope

logical scalar indicating whether to compute a confidence interval for the slope. The default value is TRUE.

conf.level

numeric scalar between 0 and 1 indicating the confidence level associated with the confidence interval for the slope. The default value is 0.95.

independent.obs

logical scalar indicating whether to assume the observations in y are seially independent. The default value is TRUE.

data.name

character string indicating the name of the data used for the trend test. The default value is deparse(substitute(y)).

season.name

character string indicating the name of the data used for the season. The default value is deparse(substitute(season)).

year.name

character string indicating the name of the data used for the year. The default value is deparse(substitute(year)).

parent.of.data

character string indicating the source of the data used for the trend test.

subset.expression

character string indicating the expression used to subset the data.

...

additional arguments affecting the test for trend.

Details

Hirsch et al. (1982) introduced a modification of Kendall's test for trend (see kendallTrendTest) that allows for seasonality in observations collected over time. They call this test the seasonal Kendall test. Their test is appropriate for testing for trend in each season when the trend is always in the same direction across all seasons. van Belle and Hughes (1984) introduced a heterogeneity test for trend which is appropriate for testing for trend in any direction in any season. Hirsch and Slack (1984) proposed an extension to the seasonal Kendall test that allows for serial dependence in the observations. The function kendallSeasonalTrendTest includes all of these tests, as well as an extension of the van Belle-Hughes heterogeneity test to the case of serial dependence.

Testing for Trend Assuming Serial Independence

The Model
Assume observations are taken over two or more years, and assume a single year can be divided into two or more seasons. Let p denote the number of seasons. Let X and Y denote two continuous random variables with some joint (bivariate) distribution (which may differ from season to season). Let N_j denote the number of bivariate observations taken in the j'th season (over two or more years) (j = 1, 2, …, p), so that

(X_{1j}, Y_{1j}), (X_{2j}, Y_{2j}), …, (X_{N_jj}, Y_{N_jj})

denote the N_j bivariate observations from this distribution for season j, assume these bivariate observations are mutually independent, and let

τ_j = \{ 2 Pr[(X_{2j} - X_{1j})(Y_{2j} - Y_{1j}) > 0] \} - 1 \;\;\;\;\;\; (1)

denote the value of Kendall's tau for that season (see kendallTrendTest). Also, assume all of the Y observations are independent.

The function kendallSeasonalTrendTest assumes that the X values always denote the year in which the observation was taken. Note that within any season, the X values need not be unique. That is, there may be more than one observation within the same year within the same season. In this case, the argument y must be a numeric vector, and you must supply the additional arguments season and year.

If there is only one observation per season per year (missing values allowed), it is usually easiest to supply the argument y as an n \times p matrix or data frame, where n denotes the number of years. In this case

N_1 = N_2 = \cdots = N_p = n \;\;\;\;\;\; (2)

and

X_{ij} = i \;\;\;\;\;\; (3)

for i = 1, 2, …, n and j = 1, 2, …, p, so if Y denotes the n \times p matrix of observed Y's and X denotes the n \times p matrix of the X's, then

Y_{11} Y_{12} \cdots Y_{1p}
Y_{21} Y_{22} \cdots Y_{2p}
\underline{Y} = . \;\;\;\;\;\; (4)
.
.
Y_{n1} Y_{n2} \cdots Y_{np}
1 1 \cdots 1
2 2 \cdots 2
\underline{X} = . \;\;\;\;\;\; (5)
.
.
n n \cdots n

The null hypothesis is that within each season the X and Y random variables are independent; that is, within each season there is no trend in the Y observations over time. This null hypothesis can be expressed as:

H_0: τ_1 = τ_2 = \cdots = τ_p = 0 \;\;\;\;\;\; (6)


The Seasonal Kendall Test for Trend
Hirsch et al.'s (1982) extension of Kendall's tau statistic to test the null hypothesis (6) is based on simply summing together the Kendall S-statistics for each season and computing the following statistic:

z = \frac{S'}{√{Var(S')}} \;\;\;\;\;\; (7)

or, using the correction for continuity,

z = \frac{S' - sign(S')}{√{Var(S')}} \;\;\;\;\;\; (8)

where

S' = ∑_{j=1}^p S_j \;\;\;\;\;\; (9)

S_j = ∑_{i=1}^{N_j-1} ∑_{k=i+1}^{N_j} sign[(X_{kj} - X_{ij})(Y_{kj} - Y_{ij})] \;\;\;\;\;\; (10)

and sign() denotes the sign function:

-1 x < 0
sign(x) = 0 x = 0 \;\;\;\;\;\; (11)
1 x > 0

Note that the quantity in Equation (10) is simply the Kendall S-statistic for season j (j = 1, 2, …, p) (see Equation (3) in the help file for kendallTrendTest).

For each season, if the predictor variables (the X's) are strictly increasing (e.g., Equation (3) above), then Equation (10) simplifies to

S_j = ∑_{i=1}^{N_j-1} ∑_{k=i+1}^{N_j} sign[(Y_{kj} - Y_{ij})] \;\;\;\;\;\; (12)

Under the null hypothesis (6), the quantity z defined in Equation (7) or (8) is approximately distributed as a standard normal random variable.

Note that there may be missing values in the observations, so let n_j denote the number of (X,Y) pairs without missing values for season j.

The statistic S' in Equation (9) has mean and variance given by:

E(S') = ∑_{j = 1}^p E(S_j) \;\;\;\;\;\; (13)

Var(S') = ∑_{j = 1}^p Var(S_j) + ∑_{g=1}^{p-1} ∑_{h=g+1}^{p} 2 Cov(S_g, S_h) \;\;\;\;\;\; (14)

Since all the observations are assumed to be mutually independent,

σ_{gh} = Cov(S_g, S_h) = 0, \;\; g \ne h, \;\; g,h = 1, 2, …, p \;\;\;\;\;\; (15)

Furthermore, under the null hypothesis (6),

E(S_j) = 0, \;\; j = 1, 2, …, p \;\;\;\;\;\; (16)

and, in the case of no tied observations,

Var(S_j) = \frac{n_j(n_j-1)(2n_j+5)}{18} \;\;\;\;\;\; (17)

for j=1,2, …, p (see equation (7) in the help file for kendallTrendTest).

In the case of tied observations,

Var(S_j) = \frac{n_j(n_j-1)(2n_j+5)}{18} -
\frac{∑_{i=1}^{g} t_i(t_i-1)(2t_i+5)}{18} -
\frac{∑_{k=1}^{h} u_k(u_k-1)(2u_k+5)}{18} +
\frac{[∑_{i=1}^{g} t_i(t_i-1)(t_i-2)][∑_{k=1}^{h} u_k(u_k-1)(u_k-2)]}{9n_k(n_k-1)(n_k-2)} +
\frac{[∑_{i=1}^{g} t_i(t_i-1)][∑_{k=1}^{h} u_k(u_k-1)]}{2n_k(n_k-1)} \;\;\;\;\;\; (18)

where g is the number of tied groups in the X observations for season j, t_i is the size of the i'th tied group in the X observations for season j, h is the number of tied groups in the Y observations for season j, and u_k is the size of the k'th tied group in the Y observations for season j (see Equation (9) in the help file for kendallTrendTest).

Estimating τ, Slope, and Intercept
The function kendall.SeasonalTrendTest returns estimated values of Kendall's τ, the slope, and the intercept for each season, as well as a single estimate for each of these three quantities combined over all seasons. The overall estimate of τ is the weighted average of the p seasonal τ's:

\hat{τ} = \frac{∑_{j=1}^p n_j \hat{τ}_j}{∑_{j=1}^p n_j} \;\;\;\;\;\; (19)

where

\hat{τ}_j = \frac{2S_j}{n_j(n_j-1)} \;\;\;\;\;\; (20)

(see Equation (2) in the help file for kendallTrendTest).

We can compute the estimated slope for season j as:

\hat{β}_{1_j} = Median(\frac{Y_{kj} - Y_{ij}}{X_{kj} - X_{ij}}); \;\; i < k; \;\; X_{kj} \ne X_{ik} \;\;\;\;\;\; (21)

for j = 1, 2, …, p. The overall estimate of slope, however, is not the median of these p estimates of slope; instead, following Hirsch et al. (1982, p.117), the overall estimate of slope is the median of all two-point slopes computed within each season:

\hat{β}_1 = Median(\frac{Y_{kj} - Y_{ij}}{X_{kj} - X_{ij}}); \;\; i < k; \;\; X_{kj} \ne X_{ik}; \;\; j = 1, 2, …, p \;\;\;\;\;\; (22)

(see Equation (15) in the help file for kendallTrendTest).

The overall estimate of intercept is the median of the p seasonal estimates of intercept:

\hat{β}_0 = Median(\hat{β}_{0_1}, \hat{β}_{0_2}, …, \hat{β}_{0_p}) \;\;\;\;\;\; (23)

where

\hat{β}_{0_j} = Y_{0.5_j} - \hat{β}_{1_j} X_{0.5_j} \;\;\;\;\;\; (24)

and X_{0.5_j} and Y_{0.5_j} denote the sample medians of the X's and Y's, respectively, for season j (see Equation (16) in the help file for kendallTrendTest).

Confidence Interval for the Slope
Gilbert (1987, p.227-228) extends his method of computing a confidence interval for the slope to the case of seasonal observations. Let N' denote the number of defined two-point estimated slopes that are used in Equation (22) above and let

\hat{β}_{1_{(1)}}, \hat{β}_{1_{(2)}}, …, \hat{β}_{1_{(N')}}

denote the N' ordered slopes. For Gilbert's (1987) method, a 100(1-α)\% two-sided confidence interval for the true over-all slope across all seasons is given by:

[\hat{β}_{1_{(M1)}}, \hat{β}_{1_{(M2+1)}}] \;\;\;\;\;\; (25)

where

M1 = \frac{N' - C_{α}}{2} \;\;\;\;\;\; (26)

M2 = \frac{N' + C_{α}}{2} \;\;\;\;\;\; (27)

C_α = z_{1 - α/2} √{Var(S')} \;\;\;\;\;\; (28)

Var(S') is defined in Equation (14), and z_p denotes the p'th quantile of the standard normal distribution. One-sided confidence intervals may computed in a similar fashion.

Usually the quantities M1 and M2 will not be integers. Gilbert (1987, p.219) suggests interpolating between adjacent values in this case, which is what the function kendallSeasonalTrendTest does.

The Van Belle-Hughes Heterogeneity Test for Trend
The seasonal Kendall test described above is appropriate for testing the null hypothesis (6) against the alternative hypothesis of a trend in at least one season. All of the trends in each season should be in the same direction.

The seasonal Kendall test is not appropriate for testing for trend when there are trends in a positive direction in one or more seasons and also negative trends in one or more seasons. For example, for the following set of observations, the seasonal Kendall statistic S' is 0 with an associated two-sided p-value of 1, even though there is clearly a positive trend in season 1 and a negative trend in season 2.

Year Season 1 Season 2
1 5 8
2 6 7
3 7 6
4 8 5

Van Belle and Hughes (1984) suggest using the following statistic to test for heterogeneity in trend prior to applying the seasonal Kendall test:

χ_{het}^2 = ∑_{j=1}^p Z_j^2 - p \bar{Z}^2 \;\;\;\;\;\; (29)

where

Z_j = \frac{S_j}{Var(S_j)} \;\;\;\;\;\; (30)

\bar{Z} = \frac{1}{p} ∑_{j=1}^p Z_j \;\;\;\;\;\; (31)

Under the null hypothesis (6), the statistic defined in Equation (29) is approximately distributed as a chi-square random variable with p-1 degrees of freedom. Note that the continuity correction is not used to compute the Z_j's defined in Equation (30) since using it results in an unacceptably conservative test (van Belle and Hughes, 1984, p.132). Van Belle and Hughes (1984) actually call the statistic in (29) a homogeneous chi-square statistic. Here it is called a heterogeneous chi-square statistic after the alternative hypothesis it is meant to test.

Van Belle and Hughes (1984) imply that the heterogeneity statistic defined in Equation (29) may be used to test the null hypothesis:

H_0: τ_1 = τ_2 = \cdots = τ_p = τ \;\;\;\;\;\; (32)

where τ is some arbitrary number between -1 and 1. For this case, however, the distribution of the test statistic in Equation (29) is unknown since it depends on the unknown value of τ (Equations (16)-(18) above assume τ = 0 and are not correct if τ \ne 0). The heterogeneity chi-square statistic of Equation (29) may be assumed to be approximately distributed as chi-square with p-1 degrees of freedom under the null hypothesis (32), but further study is needed to determine how well this approximation works.

Testing for Trend Assuming Serial Dependence

The Model
Assume the same model as for the case of serial independence, except now the observed Y's are not assumed to be independent of one another, but are allowed to be serially correlated over time. Furthermore, assume one observation per season per year (Equations (2)-(5) above).

The Seasonal Kendall Test for Trend Modified for Serial Dependence
Hirsch and Slack (1984) introduced a modification of the seasonal Kendall test that is robust against serial dependence (in terms of Type I error) except when the observations have a very strong long-term persistence (very large autocorrelation) or when the sample sizes are small (e.g., 5 years of monthly data). This modification is based on a multivariate test introduced by Dietz and Killeen (1981).

In the case of serial dependence, Equation (15) is no longer true, so an estimate of the correct value of σ_{gh} must be used to compute Var(S') in Equation (14). Let R denote the n \times p matrix of ranks for the Y observations (Equation (4) above), where the Y's are ranked within season:

R_{11} R_{12} \cdots R_{1p}
R_{21} R_{22} \cdots R_{2p}
\underline{R} = . \;\;\;\;\;\; (33)
.
.
R_{n1} R_{n2} \cdots R_{np}

where

R_{ij} = \frac{1}{2} [n_j + 1 ∑_{k=1}^{n_j} sign(Y_{ij} - Y_{kj})] \;\;\;\;\;\; (34)

the sign function is defined in Equation (11) above, and as before n_j denotes the number of (X,Y) pairs without missing values for season j. Note that by this definition, missing values are assigned the mid-rank of the non-missing values.

Hirsch and Slack (1984) suggest using the following formula, given by Dietz and Killeen (1981), in the case where there are no missing values:

\hat{σ}_{gh} = \frac{1}{3} [K_{gh} + 4 ∑_{i=1}^n R_{ig}R_{ih} - n(n+1)^2] \;\;\;\;\;\; (35)

where

K_{gh} = ∑_{i=1}^{n-1} ∑_{j=i+1}^n sign[(Y_{jg} - Y_{ig})(Y_{jh} - Y_{ih})] \;\;\;\;\;\; (36)

Note that the quantity defined in Equation (36) is Kendall's tau for season g vs. season h.

For the case of missing values, Hirsch and Slack (1984) derive the following modification of Equation (35):

\hat{σ}_{gh} = \frac{1}{3} [K_{gh} + 4 ∑_{i=1}^n R_{ig}R_{ih} - n(n_g + 1)(n_h + 1)] \;\;\;\;\;\; (37)

Technically, the estimates in Equations (35) and (37) are not correct estimators of covariance, and Equations (17) and (18) are not correct estimators of variance, because the model Dietz and Killeen (1981) use assumes that observations within the rows of Y (Equation (4) above) may be correlated, but observations between rows are independent. Serial dependence induces correlation between all of the Y's. In most cases, however, the serial dependence shows an exponential decay in correlation across time and so these estimates work fairly well (see more discussion in the BACKGROUND section below).

Estimates and Confidence Intervals
The seasonal and over-all estimates of τ, slope, and intercept are computed using the same methods as in the case of serial independence. Also, the method for computing the confidence interval for the slope is the same as in the case of serial independence. Note that the serial dependence is accounted for in the term Var(S') in Equation (28).

The Van Belle-Hughes Heterogeneity Test for Trend Modified for Serial Dependence
Like its counterpart in the case of serial independence, the seasonal Kendall test modified for serial dependence described above is appropriate for testing the null hypothesis (6) against the alternative hypothesis of a trend in at least one season. All of the trends in each season should be in the same direction.

The modified seasonal Kendall test is not appropriate for testing for trend when there are trends in a positive direction in one or more seasons and also negative trends in one or more seasons. This section describes a modification of the van Belle-Hughes heterogeneity test for trend in the presence of serial dependence.

Let \underline{S} denote the p \times 1 vector of Kendall S-statistics for each season:

S_1
S_2
\underline{S} = . \;\;\;\;\;\; (38)
.
.
S_p

The distribution of \underline{S} is approximately multivariate normal with

μ_1
μ_2
E(\underline{S}) = \underline{μ} = . \;\;\;\;\;\; (39)
.
.
μ_p
σ_1^2 σ_{12} \cdots σ_{1p}
σ_{21} σ_2^2 \cdots σ_{2p}
Var(\underline{S}) = Σ = . \;\;\;\;\;\; (40)
.
.
σ_{n1} σ_{n2} \cdots σ_n^2

where

μ_j = \frac{n_j(n_j - 1)}{2} τ_j, \;\; j = 1, 2, …, p \;\;\;\;\;\; (41)

Define the p \times p matrix \underline{m} as

\frac{2}{n_1(n_1 - 1)} 0 \cdots 0
0 \frac{2}{n_2(n_2 - 1)} \cdots 0
\underline{m} = . \;\;\;\;\;\; (42)
.
.
0 0 \cdots \frac{2}{n_p(n_p - 1)}

Then the vector of the seasonal estimates of τ can be written as:

\hat{τ}_1 2S_1/[n_1(n_1-1)]
\hat{τ}_2 2S_2/[n_2(n_2-1)]
\underline{\hat{τ}} = . = . = \underline{m} \; \underline{S} \;\;\;\;\; (43)
. .
. .
\hat{τ}_p 2S_p/[n_p(n_p-1)]

so the distribution of the vector in Equation (43) is approximately multivariate normal with

τ_1
τ_2
E(\underline{\hat{τ}}) = E(\underline{m} \underline{S}) = \underline{m} \underline{μ} = \underline{τ} = . \;\;\;\;\;\; (44)
.
.
τ_p

Var(\underline{\hat{τ}}) = Var(\underline{m} \; \underline{S}) = \underline{m} Σ \underline{m}^T = Σ_{\hat{τ}} \;\;\;\;\;\; (45)

where ^T denotes the transpose operator. Let \underline{C} denote the (p-1) \times p contrast matrix

\underline{C} = [\; \underline{1} \; | \; -I_p] \;\;\;\;\;\; (46)

where I_p denotes the p \times p identity matrix. That is,

1 -1 0 \cdots 0
1 0 -1 \cdots 0
\underline{C} = . .
. .
. .
1 0 0 \cdots -1

Then the null hypothesis (32) is equivalent to the null hypothesis:

H_0: \underline{C} \underline{τ} = 0 \;\;\;\;\;\; (47)

Based on theory for samples from a multivariate normal distribution (Johnson and Wichern, 2007), under the null hypothesis (47) the quantity

χ_{het}^2 = (\underline{C} \; \underline{\hat{τ}})^T (\underline{C} \hat{Σ}_{\hat{τ}} \underline{C}^T)^{-1} (\underline{C} \; \underline{\hat{τ}}) \;\;\;\;\;\; (48)

has approximately a chi-square distribution with p-1 degrees of freedom for “large” values of seasonal sample sizes, where

\hat{Σ}_{\hat{τ}} = \underline{m} \hat{Σ} \underline{m}^T \;\;\;\;\;\; (49)

The estimate of Σ in Equation (49) can be computed using the same formulas that are used for the modified seasonal Kendall test (i.e., Equation (35) or (37) for the off-diagonal elements and Equation (17) or (18) for the diagonal elements). As previously noted, the formulas for the variances are actually only valid if t = 0 and there is no correlation between the rows of Y. The same is true of the formulas for the covariances. More work is needed to determine the goodness of the chi-square approximation for the test statistic in (48). The pseudo-heterogeneity test statistic of Equation (48), however, should provide some guidance as to whether the null hypothesis (32) (or equivalently (47)) appears to be true.

Value

A list of class "htest" containing the results of the hypothesis test. See the help file for htest.object for details. In addition, the following components are part of the list returned by
kendallSeasonalTrendTest:

seasonal.S

numeric vector. The value of the Kendall S-statistic for each season.

var.seasonal.S

numeric vector. The variance of the Kendall S-statistic for each season. This component only appears when independent.obs=TRUE.

var.cov.seasonal.S

numeric matrix. The estimated variance-covariance matrix of the Kendall S-statistics for each season. This component only appears when
independent.obs=FALSE.

seasonal.estimates

numeric matrix. The estimated Kendall's tau, slope, and intercept for each season.

Note

Kendall's test for independence or trend is a nonparametric test. No assumptions are made about the distribution of the X and Y variables. Hirsch et al. (1982) introduced the seasonal Kendall test to test for trend within each season. They note that Kendall's test for trend is easy to compute, even in the presence of missing values, and can also be used with censored values.

van Belle and Hughes (1984) note that the seasonal Kendall test introduced by Hirsch et al. (1982) is similar to a multivariate extension of the sign test proposed by Jonckheere (1954). Jonckheeere's test statistic is based on the unweighted sum of the seasonal tau statistics, while Hirsch et al.'s test is based on the weighted sum (weighted by number of observations within a season) of the seasonal tau statistics.

van Belle and Hughes (1984) also note that Kendall's test for trend is slightly less powerful than the test based on Spearman's rho, but it converges to normality faster. Also, Bradley (1968, p.288) shows that for the case of a linear model with normal (Gaussian) errors, the asymptotic relative efficiency of Kendall's test for trend versus the parametric test for a zero slope is 0.98.

Based on the work of Dietz and Killeen (1981), Hirsch and Slack (1984) describe a modified version of the seasonal Kendall test that allows for serial dependence in the observations. They performed a Monte Carlo study to determine the empirical significance level and power of this modified test vs. the test that assumes independent observations and found a trade-off between power and the correct significance level. For p = 12 seasons, they found the modified test gave correct significance levels for n ≥q 10 as long as the lag-one autocorrelation was 0.6 or less, while the original test that assumes independent observations yielded highly inflated significance levels. On the other hand, if in fact the observations are serially independent, the original test is more powerful than the modified test.

Hirsch and Slack (1984) also looked at the performance of the test for trend introduced by Dietz and Killeen (1981), which is a weighted sums of squares of the seasonal Kendall S-statistics, where the matrix of weights is the inverse of the covariance matrix. The Dietz-Killeen test statistic, unlike the one proposed by Hirsh and Slack (1984), tests for trend in either direction in any season, and is asymptotically distributed as a chi-square random variable with p (number of seasons) degrees of freedom. Hirsch and Slack (1984), however, found that the test based on this statistic is quite conservative (i.e., the significance level is much smaller than the assumed significance level) and has poor power even for moderate sample sizes. The chi-square approximation becomes reasonably close only when n > 40 if p = 12, n > 30 if p = 4, and n > 20 if p = 2.

Lettenmaier (1988) notes the poor power of the test proposed by Dietz and Killeen (1981) and states the poor power apparently results from an upward bias in the estimated variance of the statistic, which can be traced to the inversion of the estimated covariance matrix. He suggests an alternative test statistic (to test trend in either direction in any season) that is the sum of the squares of the scaled seasonal Kendall S-statistics (scaled by their standard deviations). Note that this test statistic ignores information about the covariance between the seasonal Kendall S-statistics, although its distribution depends on these covariances. In the case of no serial dependence, Lettenmaier's test statistic is exactly the same as the Dietz-Killeen test statistic. In the case of serial dependence, Lettenmaier (1988) notes his test statistic is a quadratic form of a multivariate normal random variable and therefore all the moments of this random variable are easily computed. Lettenmaier (1988) approximates the distribution of his test statistic as a scaled non-central chi-square distribution (with fractional degrees of freedom). Based on extensive Monte Carlo studies, Lettenmaier (1988) shows that for the case when the trend is the same in all seasons, the seasonal Kendall's test of Hirsch and Slack (1984) is superior to his test and far superior to the Dietz-Killeen test. The power of Lettenmaier's test approached that of the seasonal Kendall test for large trend magnitudes.

Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

References

Bradley, J.V. (1968). Distribution-Free Statistical Tests. Prentice-Hall, Englewood Cliffs, NJ.

Conover, W.J. (1980). Practical Nonparametric Statistics. Second Edition. John Wiley and Sons, New York, pp.256-272.

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York, NY, Chapter 16.

Helsel, D.R. and R.M. Hirsch. (1988). Discussion of Applicability of the t-test for Detecting Trends in Water Quality Variables. Water Resources Bulletin 24(1), 201-204.

Helsel, D.R., and R.M. Hirsch. (1992). Statistical Methods in Water Resources Research. Elsevier, NY.

Helsel, D.R., and R. M. Hirsch. (2002). Statistical Methods in Water Resources. Techniques of Water Resources Investigations, Book 4, chapter A3. U.S. Geological Survey. Available on-line at http://pubs.usgs.gov/twri/twri4a3/pdf/twri4a3-new.pdf.

Hirsch, R.M., J.R. Slack, and R.A. Smith. (1982). Techniques of Trend Analysis for Monthly Water Quality Data. Water Resources Research 18(1), 107-121.

Hirsch, R.M. and J.R. Slack. (1984). A Nonparametric Trend Test for Seasonal Data with Serial Dependence. Water Resources Research 20(6), 727-732.

Hirsch, R.M., R.B. Alexander, and R.A. Smith. (1991). Selection of Methods for the Detection and Estimation of Trends in Water Quality. Water Resources Research 27(5), 803-813.

Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods, Second Edition. John Wiley and Sons, New York.

Johnson, R.A., and D.W. Wichern. (2007). Applied Multivariate Statistical Analysis, Sixth Edition. Pearson Prentice Hall, Upper Saddle River, NJ.

Kendall, M.G. (1938). A New Measure of Rank Correlation. Biometrika 30, 81-93.

Kendall, M.G. (1975). Rank Correlation Methods. Charles Griffin, London.

Mann, H.B. (1945). Nonparametric Tests Against Trend. Econometrica 13, 245-259.

Millard, S.P., and Neerchal, N.K. (2001). Environmental Statistics with S-PLUS. CRC Press, Boca Raton, Florida.

Sen, P.K. (1968). Estimates of the Regression Coefficient Based on Kendall's Tau. Journal of the American Statistical Association 63, 1379-1389.

Theil, H. (1950). A Rank-Invariant Method of Linear and Polynomial Regression Analysis, I-III. Proc. Kon. Ned. Akad. v. Wetensch. A. 53, 386-392, 521-525, 1397-1412.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

van Belle, G., and J.P. Hughes. (1984). Nonparametric Tests for Trend in Water Quality. Water Resources Research 20(1), 127-136.

See Also

kendallTrendTest, htest.object, cor.test.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
  # Reproduce Example 14-10 on page 14-38 of USEPA (2009).  This example 
  # tests for trend in analyte concentrations (ppm) collected monthly  
  # between 1983 and 1985. 

  head(EPA.09.Ex.14.8.df)
  #     Month Year Unadj.Conc Adj.Conc
  #1  January 1983       1.99     2.11
  #2 February 1983       2.10     2.14
  #3    March 1983       2.12     2.10
  #4    April 1983       2.12     2.13
  #5      May 1983       2.11     2.12
  #6     June 1983       2.15     2.12

  tail(EPA.09.Ex.14.8.df)
  #       Month Year Unadj.Conc Adj.Conc
  #31      July 1985       2.31     2.23
  #32    August 1985       2.32     2.24
  #33 September 1985       2.28     2.23
  #34   October 1985       2.22     2.24
  #35  November 1985       2.19     2.25
  #36  December 1985       2.22     2.23


  # Plot the data
  #--------------
  Unadj.Conc <- EPA.09.Ex.14.8.df$Unadj.Conc
  Adj.Conc   <- EPA.09.Ex.14.8.df$Adj.Conc
  Month      <- EPA.09.Ex.14.8.df$Month
  Year       <- EPA.09.Ex.14.8.df$Year
  Time       <- paste(substring(Month, 1, 3), Year - 1900, sep = "-")
  n          <- length(Unadj.Conc)
  Three.Yr.Mean <- mean(Unadj.Conc)

  dev.new()
  par(mar = c(7, 4, 3, 1) + 0.1, cex.lab = 1.25)
  plot(1:n, Unadj.Conc, type = "n", xaxt = "n",
	xlab = "Time (Month)", 
	ylab = "ANALYTE CONCENTRATION (mg/L)", 
	main = "Figure 14-15. Seasonal Time Series Over a Three Year Period",
	cex.main = 1.1)
  axis(1, at = 1:n, labels = rep("", n))
  at <- rep(c(1, 5, 9), 3) + rep(c(0, 12, 24), each = 3)
  axis(1, at = at, labels = Time[at])
  points(1:n, Unadj.Conc, pch = 0, type = "o", lwd = 2)
  points(1:n, Adj.Conc, pch = 3, type = "o", col = 8, lwd = 2)
  abline(h = Three.Yr.Mean, lwd = 2)
  legend("topleft", c("Unadjusted", "Adjusted", "3-Year Mean"), bty = "n", 
    pch = c(0, 3, -1), lty = c(1, 1, 1), lwd = 2, col = c(1, 8, 1),
    inset = c(0.05, 0.01))


  # Perform the seasonal Kendall trend test
  #----------------------------------------

  kendallSeasonalTrendTest(Unadj.Conc ~ Month + Year, 
    data = EPA.09.Ex.14.8.df)

  #Results of Hypothesis Test
  #--------------------------
  #
  #Null Hypothesis:                 All 12 values of tau = 0
  #
  #Alternative Hypothesis:          The seasonal taus are not all equal
  #                                 (Chi-Square Heterogeneity Test)
  #                                 At least one seasonal tau != 0
  #                                 and all non-zero tau's have the
  #                                 same sign (z Trend Test)
  #
  #Test Name:                       Seasonal Kendall Test for Trend
  #                                 (with continuity correction)
  #
  #Estimated Parameter(s):          tau       =    0.9722222
  #                                 slope     =    0.0600000
  #                                 intercept = -131.7350000
  #
  #Estimation Method:               tau:        Weighted Average of
  #                                             Seasonal Estimates
  #                                 slope:      Hirsch et al.'s
  #                                             Modification of
  #                                             Thiel/Sen Estimator
  #                                 intercept:  Median of
  #                                             Seasonal Estimates
  #
  #Data:                            y      = Unadj.Conc
  #                                 season = Month     
  #                                 year   = Year      
  #
  #Data Source:                     EPA.09.Ex.14.8.df
  #
  #Sample Sizes:                    January   =  3
  #                                 February  =  3
  #                                 March     =  3
  #                                 April     =  3
  #                                 May       =  3
  #                                 June      =  3
  #                                 July      =  3
  #                                 August    =  3
  #                                 September =  3
  #                                 October   =  3
  #                                 November  =  3
  #                                 December  =  3
  #                                 Total     = 36
  #
  #Test Statistics:                 Chi-Square (Het) = 0.1071882
  #                                 z (Trend)        = 5.1849514
  #
  #Test Statistic Parameter:        df = 11
  #
  #P-values:                        Chi-Square (Het) = 1.000000e+00
  #                                 z (Trend)        = 2.160712e-07
  #
  #Confidence Interval for:         slope
  #
  #Confidence Interval Method:      Gilbert's Modification of
  #                                 Theil/Sen Method
  #
  #Confidence Interval Type:        two-sided
  #
  #Confidence Level:                95%
  #
  #Confidence Interval:             LCL = 0.05786914
  #                                 UCL = 0.07213086

  #==========

  # Clean up
  #---------

  rm(Unadj.Conc, Adj.Conc, Month, Year, Time, n, Three.Yr.Mean, at)
  graphics.off()

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.