HOMTESTS: Homogeneity tests

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Homogeneity tests for Regional Frequency Analysis.

Usage

1
2
3
 ADbootstrap.test (x, cod, Nsim=500, index=2)
 HW.tests (x, cod, Nsim=500)
 DK.test (x, cod)

Arguments

x

vector representing data from many samples defined with cod

cod

array that defines the data subdivision among sites

Nsim

number of regions simulated with the bootstrap of the original region

index

if index=1 samples are divided by their average value; if index=2 (default) samples are divided by their median value

Details

The Hosking and Wallis heterogeneity measures

The idea underlying Hosking and Wallis (1993) heterogeneity statistics is to measure the sample variability of the L-moment ratios and compare it to the variation that would be expected in a homogeneous region. The latter is estimated through repeated simulations of homogeneous regions with samples drawn from a four parameter kappa distribution (see e.g., Hosking and Wallis, 1997, pp. 202-204). More in detail, the steps are the following: with regards to the k samples belonging to the region under analysis, find the sample L-moment ratios (see, Hosking and Wallis, 1997) pertaining to the i-th site: these are the L-coefficient of variation (L-CV),

t^(i) = (1/ni ∑[j from 1 to ni](2(j - 1)/(ni - 1) - 1) Y(i,j)) / (1/ni ∑[j from 1 to ni] Y(i,j))

the coefficient of L-skewness,

t3^(i) = (1/ni ∑[j from 1 to ni](6(j-1)(j-2)/(ni-1)/(ni-2) - 6(j-1)/(ni-1) + 1) Y(i,j)) / (1/ni ∑[j from 1 to ni](2(j-1)/(ni-1) - 1) Y(i,j))

and the coefficient of L-kurtosis

t4^(i) = (1/ni ∑[j from 1 to ni](20(j-1)(j-2)(j-3)/(ni-1)/(ni-2)/(ni-3) - 30(j-1)(j-2)/(ni-1)/(ni-2) + 12(j-1)/(ni-1) - 1) Y(i,j)) / (1/ni ∑[j from 1 to ni](2(j-1)/(ni-1) - 1)Y(i,j))

Note that the L-moment ratios are not affected by the normalization by the index value, i.e. it is the same to use X(i,j) or Y(i,j) in Equations.

Define the regional averaged L-CV, L-skewness and L-kurtosis coefficients,

t^R = (∑[i from 1 to k] ni t^(i)) / (∑[i from 1 to k] ni)

t3^R = (∑[i from 1 to k] ni t3^(i)) / (∑[i from 1 to k] ni)

t4^R = (∑[i from 1 to k] ni t4^(i)) / (∑[i from 1 to k] ni)

and compute the statistic

V = {∑[i from 1 to k] ni (t^(i) - t^R)^2 / ∑[i from 1 to k] ni}^(1/2)

Fit the parameters of a four-parameters kappa distribution to the regional averaged L-moment ratios t^R, t3^R and t4^R, and then generate a large number Nsim of realizations of sets of k samples. The i-th site sample in each set has a kappa distribution as its parent and record length equal to ni. For each simulated homogeneous set, calculate the statistic V, obtaining Nsim values. On this vector of V values determine the mean μV and standard deviation σV that relate to the hypothesis of homogeneity (actually, under the composite hypothesis of homogeneity and kappa parent distribution).

An heterogeneity measure, which is called here HW1, is finally found as

θ(HW1) = (V - μV)/(σV)

θ(HW1) can be approximated by a normal distributed with zero mean and unit variance: following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if θ(HW1)<1, ‘possibly heterogeneous’ if 1 ≤ θ(HW1) < 2, and ‘definitely heterogeneous’ if θ(HW1) ≥ 2. Hosking and Wallis (1997) suggest that these limits should be treated as useful guidelines. Even if the θ(HW1) statistic is constructed like a significance test, significance levels obtained from such a test would in fact be accurate only under special assumptions: to have independent data both serially and between sites, and the true regional distribution being kappa.

Hosking and Wallis (1993) also give an alternative heterogeneity measure (that we call HW2), in which V is replaced by:

V2 = ∑[i from 1 to k] ni {(t^(i) - t^R)^2 + (t3^(i) - t3^R)^2}^(1/2) / ∑[i from 1 to k] ni

The test statistic in this case becomes

θ(HW2) = (V2 - μ(V2)) / (σ(V2))

with similar acceptability limits as the HW1 statistic. Hosking and Wallis (1997) judge θ(HW2) to be inferior to θ(HW1) and say that it rarely yields values larger than 2 even for grossly heterogeneous regions.

The bootstrap Anderson-Darling test

A test that does not make any assumption on the parent distribution is the Anderson-Darling (AD) rank test (Scholz and Stephens, 1987). The AD test is the generalization of the classical Anderson-Darling goodness of fit test (e.g., D'Agostino and Stephens, 1986), and it is used to test the hypothesis that k independent samples belong to the same population without specifying their common distribution function.

The test is based on the comparison between local and regional empirical distribution functions. The empirical distribution function, or sample distribution function, is defined by F(x) = j/η, x(j) ≤ x < x(j+1), where η is the size of the sample and x(j) are the order statistics, i.e. the observations arranged in ascending order. Denote the empirical distribution function of the i-th sample (local) by \hatFi(x), and that of the pooled sample of all N = n1 + ... + nk observations (regional) by HN(x). The k-sample Anderson-Darling test statistic is then defined as

θ(AD) = ∑[i from 1 to k] ni integral[all x] ((\hatFi(x) - HN(x))^2) / (HN(x) (1 - HN(x))) dHN(x)

If the pooled ordered sample is Z1 < ... < ZN, the computational formula to evaluate θ(AD) is:

θ(AD) = 1/N ∑[i from 1 to k] 1/ni ∑[i from 1 to N-1] ((N M(ij) - j ni)^2) / (j(N-j))

where M(ij) is the number of observations in the i-th sample that are not greater than Zj. The homogeneity test can be carried out by comparing the obtained θ(AD) value to the tabulated percentage points reported by Scholz and Stephens (1987) for different significance levels.

The statistic θ(AD) depends on the sample values only through their ranks. This guarantees that the test statistic remains unchanged when the samples undergo monotonic transformations, an important stability property not possessed by HW heterogeneity measures. However, problems arise in applying this test in a common index value procedure. In fact, the index value procedure corresponds to dividing each site sample by a different value, thus modifying the ranks in the pooled sample. In particular, this has the effect of making the local empirical distribution functions much more similar to the other, providing an impression of homogeneity even when the samples are highly heterogeneous. The effect is analogous to that encountered when applying goodness-of-fit tests to distributions whose parameters are estimated from the same sample used for the test (e.g., D'Agostino and Stephens, 1986; Laio, 2004). In both cases, the percentage points for the test should be opportunely redetermined. This can be done with a nonparametric bootstrap approach presenting the following steps: build up the pooled sample S of the observed non-dimensional data. Sample with replacement from S and generate k artificial local samples, of size n1, ..., nk. Divide each sample for its index value, and calculate θ^(1)(AD). Repeat the procedure for Nsim times and obtain a sample of θ^(j)(AD), j = 1, ..., Nsim values, whose empirical distribution function can be used as an approximation of G(H0)(θ(AD)), the distribution of θ(AD) under the null hypothesis of homogeneity. The acceptance limits for the test, corresponding to any significance level α, are then easily determined as the quantiles of G(H0)(θ(AD)) corresponding to a probability (1-α).

We will call the test obtained with the above procedure the bootstrap Anderson-Darling test, hereafter referred to as AD.

Durbin and Knott test

The last considered homogeneity test derives from a goodness-of-fit statistic originally proposed by Durbin and Knott (1971). The test is formulated to measure discrepancies in the dispersion of the samples, without accounting for the possible presence of discrepancies in the mean or skewness of the data. Under this aspect, the test is similar to the HW1 test, while it is analogous to the AD test for the fact that it is a rank test. The original goodness-of-fit test is very simple: suppose to have a sample Xi, i = 1, ..., n, with hypothetical distribution F(x); under the null hypothesis the random variable F(Xi) has a uniform distribution in the (0,1) interval, and the statistic D = ∑[i from 1 to n] \cos(2 π F(Xi)) is approximately normally distributed with mean 0 and variance 1 (Durbin and Knott, 1971). D serves the purpose of detecting discrepancy in data dispersion: if the variance of Xi is greater than that of the hypothetical distribution F(x), D is significantly greater than 0, while D is significantly below 0 in the reverse case. Differences between the mean (or the median) of Xi and F(x) are instead not detected by D, which guarantees that the normalization by the index value does not affect the test.

The extension to homogeneity testing of the Durbin and Knott (DK) statistic is straightforward: we substitute the empirical distribution function obtained with the pooled observed data, HN(x), for F(x) in D, obtaining at each site a statistic

Di = ∑[j from 1 to ni] \cos(2 π HN(Xj))

which is normal under the hypothesis of homogeneity. The statistic θ(DK) = ∑[i from 1 to k] Di^2 has then a chi-squared distribution with k-1 degrees of freedom, which allows one to determine the acceptability limits for the test, corresponding to any significance level α.

Comparison among tests

The comparison (Viglione et al, 2007) shows that the Hosking and Wallis heterogeneity measure HW1 (only based on L-CV) is preferable when skewness is low, while the bootstrap Anderson-Darling test should be used for more skewed regions. As for HW2, the Hosking and Wallis heterogeneity measure based on L-CV and L-CA, it is shown once more how much it lacks power.

Our suggestion is to guide the choice of the test according to a compromise between power and Type I error of the HW1 and AD tests. The L-moment space is divided into two regions: if the t3^R coefficient for the region under analysis is lower than 0.23, we propose to use the Hosking and Wallis heterogeneity measure HW1; if t3^R > 0.23, the bootstrap Anderson-Darling test is preferable.

Value

ADbootstrap.test and DK.test test gives its test statistic and its distribution value P. If P is, for example, 0.92, samples shouldn't be considered heterogeneous with significance level minor of 8

HW.tests gives the two Hosking and Wallis heterogeneity measures HW1 and HW2; following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if HW < 1, ‘possibly heterogeneous’ if 1 ≤ HW < 2, and ‘definitely heterogeneous’ if HW ≥ 2.

Author(s)

Alberto Viglione, e-mail: alviglio@tiscali.it.

References

D'Agostino R., Stephens M. (1986) Goodness-of-Fit Techniques, chapter Tests based on EDF statistics. Marcel Dekker, New York.

Durbin J., Knott M. (1971) Components of Cramer-von Mises statistics. London School of Economics and Political Science, pp. 290-307.

Hosking J., Wallis J. (1993) Some statistics useful in regional frequency analysis. Water Resources Research, 29 (2), pp. 271-281.

Hosking, J.R.M. and Wallis, J.R. (1997) Regional Frequency Analysis: an approach based on L-moments, Cambridge University Press, Cambridge, UK.

Laio, F., Cramer-von Mises and Anderson-Darling goodness of fit tests for extreme value distributions with unknown parameters, Water Resour. Res., 40, W09308, doi:10.1029/2004WR003204.

Scholz F., Stephens M. (1987) K-sample Anderson-Darling tests. Journal of American Statistical Association, 82 (399), pp. 918-924.

Viglione A., Laio F., Claps P. (2007) “A comparison of homogeneity tests for regional frequency analysis”, Water Resources Research, 43, W03428, doi:10.1029/2006WR005095.

Viglione A. (2007) Metodi statistici non-supervised per la stima di grandezze idrologiche in siti non strumentati, PhD thesis, Politecnico di Torino.

See Also

KAPPA, Lmoments.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(annualflows)
annualflows[1:10,]
summary(annualflows)
x <- annualflows["dato"][,]
cod <- annualflows["cod"][,]
split(x,cod)

#ADbootstrap.test(x,cod,Nsim=100)   # it takes some time
#HW.tests(x,cod)                    # it takes some time
DK.test(x,cod)

fac <- factor(annualflows["cod"][,],levels=c(34:38))
x2 <- annualflows[!is.na(fac),"dato"]
cod2 <- annualflows[!is.na(fac),"cod"]
split(x2,cod2)
sapply(split(x2,cod2),Lmoments)
regionalLmoments(x2,cod2)

ADbootstrap.test(x2,cod2)
ADbootstrap.test(x2,cod2,index=1)
HW.tests(x2,cod2)
DK.test(x2,cod2)

Example output

   cod anno dato
1    1 1956 1494
2    1 1957 1309
3    1 1958 1699
4    1 1959 1467
5    1 1960 1918
6    1 1961 1469
7    1 1962 1267
8    1 1963 1523
9    1 1964 1338
10   1 1965 1438
      cod            anno           dato       
 Min.   : 1.0   Min.   :1921   Min.   : 172.0  
 1st Qu.:13.0   1st Qu.:1940   1st Qu.: 725.2  
 Median :22.0   Median :1951   Median : 981.0  
 Mean   :23.7   Mean   :1951   Mean   :1041.4  
 3rd Qu.:34.0   3rd Qu.:1960   3rd Qu.:1308.8  
 Max.   :49.0   Max.   :1985   Max.   :3045.0  
$`1`
 [1] 1494 1309 1699 1467 1918 1469 1267 1523 1338 1438 1788 1591 1697 1780 1769

$`2`
 [1] 1144 1652 1807 1881 1741 1124 2064 1434 1678 1239  921  983 1093 1744 1213
[16] 1590  956 1124 2181 1077 1345 1219  988 1325 1277 1479 1307 2053 1232  973
[31] 1407  912

$`3`
 [1] 2596  954 1115 1248  867 1280 1588 1055 1764 3045

$`4`
 [1]  871 1238 1505 1636 1553 1936 1739 1867 1184 1630 1311 1520 1201 1614 1971
[16] 1829 1781 1093 1996 1328 1662 1199  860  961  949 1536 1016 1386  820 1023
[31] 2329 1209 1305 1334 1024 1364 1310 1410 1247 2393 1317  909 1808 1020 1181
[46] 1365 1218 1644 1160 1002 1243 1332 1033 1170 1685 1478 2434 1600 1369 1215
[61] 1614 1449 1518 1490 1191

$`7`
 [1] 1481 1758 1774 1625 1607 2826 1488  928 2379 1173 1801 1824 1309 2220 1733

$`8`
 [1] 1086 1810 2244 2138 2028 1308 1947 1528 2244 1594  861 1378 1795 1344 1558
[16]  696  724 2497  660 1388 1484  952 1987 2646 1689 1443 2688 1249 1145 2392
[31] 1001 1380

$`9`
 [1] 2075 1607 1717 1261 1824 1330  963 1313 2276  682 1440 1304 1193

$`10`
 [1] 1096 1387 1289 1461 1054 1474 1137 1256  981 1696 1468 1850 1644 1248 1498
[16] 1317 1500 1109  859  931 1020 1493  954 1133 1144 1056

$`11`
 [1] 1320 1706  948 1643  944 1402 1202 1788 1665 1833 1679 1166 1833 1661 1938
[16] 1457  830 1221 1398 1674 1311 1611 1003 1021

$`12`
 [1]  890 1247 1040 1047  875 1060  913  968  749 1218 1104 1489 1300  833  994
[16] 1002 1134  854  826  695  939 1230  830 1096  876  704 1111  780  791  709
[31]  812  686  812  755  802 1098  868  735  829  750  635  887  711  753  935
[46]  862  830  924  735  766  930  783 1623 1359 1015  922  963  848  975  760
[61]  766

$`13`
 [1] 1288  854 1324  741 1043  756 1477 1160 1426 1360 1109 1211 1094 1666 1002
[16]  772 1124  997  649 1436  762 1293  930  721  838 1063  710 1002 1625 1002
[31]  848 1104  869  823  992  588  894 1073  675 1181 1568  817 1068  978

$`14`
 [1] 1505  928 1223  805 1449 1084 1588 1509 1137 1014 1181 1394  922  811 1428
[16] 1137 1240 1034  581 1501  700 1263  962  780  919 1068  855 1198 1569 1134
[31] 1007 1205  973  871 1188  581 1027 1192  578  875 1553  774  958 1187 2152
[46]  836  834  753 1110

$`15`
 [1]  969  811 1107  769  567  925  508  598  818  495

$`16`
 [1]  957  625  625  658 1022  555  496  625  593 1115  718  957  707  332  821
[16]  469  913  663  418  523  799  469 1000 1104  761  598 1033  707  469  614
[31]  270  609 1017  367

$`17`
 [1]  595  718  518  548  389  567  506  985  530 1097  934  675  614  587  722
[16]  499  459 1087  550  860  648  296  658

$`18`
 [1]  686  863  488  937  453  621  484  851  599 1161  894  598  645  606  772
[16]  449  486  510  559  829  545  898  529  392  856  625  773  651  674  432

$`19`
 [1]  589  715  479  696  394  533  430  845  519 1012  805  559  569  580  725
[16]  448  412  411  407  638  506  729  538  350  736  513  787

$`20`
 [1] 1237 1908 1263 1066 1401 1263 1134  799  919  971 1057 1710 1555 1667 1212
[16]  799 1366  962 1779 1504  808 1031 1186 1031 1796 1882 1487  945 1710 1194
[31]  919 1418  722 1160 1409  894 1279 1884 1307

$`21`
 [1] 489 704 310 665 259 501 428 820 551 994 658 425 423 409 736 440 401 398 342
[20] 658 449 665 535 247 584 338 580 569 311 412 565 403 846 917 525 411 717 526
[39] 248 451 185 356 564 256

$`22`
 [1] 1197  863 1382 1104  649  745  615 1116  618  739  761  720 1147  838 1057
[16]  739  529  962  470  881  495  417  553  819  711 1410 1472  727  671 1163
[31]  751  476  819  399  612  860  507  844 1245  953  976

$`23`
 [1]  835 1345 1085 1655 1291  838  974  862 1106  699  854  721  699 1033  892
[16] 1213  631  554  833  911  796  721  727

$`24`
 [1] 1795 1761 1962 1541 1007 1276 1144 1302  947 1210 1113 1532  764  849 1412
[16] 1105 1048  843 1048 1157

$`25`
 [1] 1498  880 1028 1046  589 1088 1179 1471  761 1106 2017  649 1129 1149 1355
[16] 1107

$`26`
 [1] 1634 1300 1715 1643 1295 1459 1020 1531  919 1095  876  857 1534 1183 1405
[16] 1051 1159 1478 1472 1364 1140 1126 1007

$`27`
 [1] 1157 1759 1245  842 1056  800 1244  806  925  839  782 1236 1601  886  768
[16] 1109  722  440

$`28`
 [1] 1121 1488 1158 1287 1210 1468 1445 1304 1967 1408

$`29`
 [1] 1121 1482 1163 1378 1201 1677 1360 2230 1117 1093 1647 1358

$`30`
 [1]  395  342  463  649  400  703  388  570  292  490  440  885  671 1035  729
[16]  360  467  351  765  418  455  339  311  493  432  686  353  337  449  513
[31]  374  475  628  496  844  974  375  419  651  441  226  438  218  461  504
[46]  309  543  870  433  724  604  712  865  395  324  436  607  399

$`31`
 [1]  754 1025  829 1428 1828 1472  771 1144  980 1728  720  850  995  901 1138
[16]  678  805 1509  616  629  716  848  767  720 1426 1370 1826 1046 1172  869
[31]  793 1008  571 1161

$`32`
 [1]  920 1674 1153 1512 1226  647  945  822 1665  632  746  705  759  932  617
[16]  632 1259  506  590  743  598  747  988  855 1229 1461  458  804  867  652
[31]  580

$`33`
 [1]  684  701  486  792  727 1086  564  624 1205  463  846  894  707  733  892
[16]  869 1283 1444  474  798  935  719  445  749  428  772  854  545 1002  939
[31]  643  603  785  775 1025  584

$`34`
 [1]  636  998 1014 1965 1333 1730 1330  825 1112  851 1423  960 1031  976  561
[16] 1055 1076 1224  658  707 1453  445  966  930  939  862 1115 1158 1573

$`35`
 [1]  845  803  746 1036 1160 1038 1285  369 1093  732  613  620  863  579  765
[16]  819  505  594  667  651  950 1583  688  622 1068

$`36`
 [1]  924 1676 1765  841  796  745 1363  663  714  382  771  796  956 1153  669
[16]  796 1879  643  796  994  733 1185

$`37`
 [1]  597  833  902 1207  793  598 1328  323  561  726  663  919 1139 1040 1264
[16] 1214

$`38`
 [1]  492  608  368  393 1123  172  281  539  424  585  632  528

$`39`
 [1]  339  929  560  727  490  684  979 1466  404  865  533  462  287  767  653
[16] 1176 1906  883

$`40`
 [1]  755.00  871.00  938.00 1175.00 1218.00  621.00  432.25  913.20  840.15
[10]  827.97  919.29  724.48  602.72

$`41`
[1] 1449 1449 1546 1516 1254 1382

$`42`
[1]  895 1006 1351 1215 1215 1279 1006 1156  821

$`43`
 [1]  948 1308 1185  801  848  926  932  755  764  891  677  835 1112  918  742
[16]  685  927

$`44`
 [1] 1607 1275 1613 1484 1487 1205 1367 1158 1583 1342 1848 1640 1225 1320 1202
[16] 1476 1190 1435  894 1326 1230 1042 1127

$`45`
 [1] 1953 1939 1677 1692 2051 2371 2022 1521 1448 1825 1363 1760 1672 1603 1244
[16] 1521 1783 1560 1357 1673 1625 1425 1688 1577 1736 1640 1584 1293 1277 1742
[31] 1491

$`46`
 [1] 1223 1077  671 1063  969  842 1037  903 1407 1153 1107 1293  813  834 1118
[16]  901  981

$`47`
 [1]  986  996 1335  964 1018  821  945  844 1133  975 1082 1252 1031  940 1078
[16]  933  709  923  899  747 1010  873  962  965  674  763  915 1029 1452 1486

$`48`
 [1]  872 1528 1062 1345 1158  998 1197 1234 1469 1343 2103 1745 1084 1717 1131
[16]  990 1186  884 1118 1383  877 1072 1906  830

$`49`
 [1]  808 1088 1435 1265 1065  911  992 1273 1031 1100  769  865  781 1019 1761

      Ak        P 
307.7723   1.0000 
$`34`
 [1]  636  998 1014 1965 1333 1730 1330  825 1112  851 1423  960 1031  976  561
[16] 1055 1076 1224  658  707 1453  445  966  930  939  862 1115 1158 1573

$`35`
 [1]  845  803  746 1036 1160 1038 1285  369 1093  732  613  620  863  579  765
[16]  819  505  594  667  651  950 1583  688  622 1068

$`36`
 [1]  924 1676 1765  841  796  745 1363  663  714  382  771  796  956 1153  669
[16]  796 1879  643  796  994  733 1185

$`37`
 [1]  597  833  902 1207  793  598 1328  323  561  726  663  919 1139 1040 1264
[16] 1214

$`38`
 [1]  492  608  368  393 1123  172  281  539  424  585  632  528

               34          35          36           37          38
l1   1065.7241379 827.7600000 965.4545455 881.68750000 512.0833333
l2    191.9729064 151.6000000 206.5800866 174.29583333 126.0681818
lcv     0.1801338   0.1831449   0.2139718   0.19768436   0.2461868
lca     0.1246570   0.1913101   0.3252284  -0.01093174   0.1775494
lkur    0.2105167   0.1536444   0.2173088   0.01341899   0.3616169
        l1R         l2R        lcvR        lcaR       lkurR 
895.1153846 175.0339202   0.1983372   0.1683511   0.1853942 
    A2kN        P 
2.641827 0.658000 
    A2kN        P 
1.933665 0.258000 
        H1         H2 
-0.7677048 -0.4166196 
Warning messages:
1: In fn(par, ...) : value out of range in 'gammafn'
2: In fn(par, ...) : value out of range in 'gammafn'
        Ak          P 
14.1152348  0.9930638 

homtest documentation built on May 2, 2019, 1:45 p.m.

Related to HOMTESTS in homtest...