Description Usage Arguments Details Value Author(s) References See Also Examples
Homogeneity tests for Regional Frequency Analysis.
1 2 3 | ADbootstrap.test (x, cod, Nsim=500, index=2)
HW.tests (x, cod, Nsim=500)
DK.test (x, cod)
|
x |
vector representing data from many samples defined with |
cod |
array that defines the data subdivision among sites |
Nsim |
number of regions simulated with the bootstrap of the original region |
index |
if |
The Hosking and Wallis heterogeneity measures
The idea underlying Hosking and Wallis (1993) heterogeneity statistics is to measure the sample variability of the L-moment ratios and compare it to the variation that would be expected in a homogeneous region. The latter is estimated through repeated simulations of homogeneous regions with samples drawn from a four parameter kappa distribution (see e.g., Hosking and Wallis, 1997, pp. 202-204). More in detail, the steps are the following: with regards to the k samples belonging to the region under analysis, find the sample L-moment ratios (see, Hosking and Wallis, 1997) pertaining to the i-th site: these are the L-coefficient of variation (L-CV),
t^(i) = (1/ni ∑[j from 1 to ni](2(j - 1)/(ni - 1) - 1) Y(i,j)) / (1/ni ∑[j from 1 to ni] Y(i,j))
the coefficient of L-skewness,
t3^(i) = (1/ni ∑[j from 1 to ni](6(j-1)(j-2)/(ni-1)/(ni-2) - 6(j-1)/(ni-1) + 1) Y(i,j)) / (1/ni ∑[j from 1 to ni](2(j-1)/(ni-1) - 1) Y(i,j))
and the coefficient of L-kurtosis
t4^(i) = (1/ni ∑[j from 1 to ni](20(j-1)(j-2)(j-3)/(ni-1)/(ni-2)/(ni-3) - 30(j-1)(j-2)/(ni-1)/(ni-2) + 12(j-1)/(ni-1) - 1) Y(i,j)) / (1/ni ∑[j from 1 to ni](2(j-1)/(ni-1) - 1)Y(i,j))
Note that the L-moment ratios are not affected by the normalization by the index value, i.e. it is the same to use X(i,j) or Y(i,j) in Equations.
Define the regional averaged L-CV, L-skewness and L-kurtosis coefficients,
t^R = (∑[i from 1 to k] ni t^(i)) / (∑[i from 1 to k] ni)
t3^R = (∑[i from 1 to k] ni t3^(i)) / (∑[i from 1 to k] ni)
t4^R = (∑[i from 1 to k] ni t4^(i)) / (∑[i from 1 to k] ni)
and compute the statistic
V = {∑[i from 1 to k] ni (t^(i) - t^R)^2 / ∑[i from 1 to k] ni}^(1/2)
Fit the parameters of a four-parameters kappa distribution to the regional averaged L-moment ratios t^R, t3^R and t4^R, and then generate a large number Nsim of realizations of sets of k samples. The i-th site sample in each set has a kappa distribution as its parent and record length equal to ni. For each simulated homogeneous set, calculate the statistic V, obtaining Nsim values. On this vector of V values determine the mean μV and standard deviation σV that relate to the hypothesis of homogeneity (actually, under the composite hypothesis of homogeneity and kappa parent distribution).
An heterogeneity measure, which is called here HW1, is finally found as
θ(HW1) = (V - μV)/(σV)
θ(HW1) can be approximated by a normal distributed with zero mean and unit variance: following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if θ(HW1)<1, ‘possibly heterogeneous’ if 1 ≤ θ(HW1) < 2, and ‘definitely heterogeneous’ if θ(HW1) ≥ 2. Hosking and Wallis (1997) suggest that these limits should be treated as useful guidelines. Even if the θ(HW1) statistic is constructed like a significance test, significance levels obtained from such a test would in fact be accurate only under special assumptions: to have independent data both serially and between sites, and the true regional distribution being kappa.
Hosking and Wallis (1993) also give an alternative heterogeneity measure (that we call HW2), in which V is replaced by:
V2 = ∑[i from 1 to k] ni {(t^(i) - t^R)^2 + (t3^(i) - t3^R)^2}^(1/2) / ∑[i from 1 to k] ni
The test statistic in this case becomes
θ(HW2) = (V2 - μ(V2)) / (σ(V2))
with similar acceptability limits as the HW1 statistic. Hosking and Wallis (1997) judge θ(HW2) to be inferior to θ(HW1) and say that it rarely yields values larger than 2 even for grossly heterogeneous regions.
The bootstrap Anderson-Darling test
A test that does not make any assumption on the parent distribution is the Anderson-Darling (AD) rank test (Scholz and Stephens, 1987). The AD test is the generalization of the classical Anderson-Darling goodness of fit test (e.g., D'Agostino and Stephens, 1986), and it is used to test the hypothesis that k independent samples belong to the same population without specifying their common distribution function.
The test is based on the comparison between local and regional empirical distribution functions. The empirical distribution function, or sample distribution function, is defined by F(x) = j/η, x(j) ≤ x < x(j+1), where η is the size of the sample and x(j) are the order statistics, i.e. the observations arranged in ascending order. Denote the empirical distribution function of the i-th sample (local) by \hatFi(x), and that of the pooled sample of all N = n1 + ... + nk observations (regional) by HN(x). The k-sample Anderson-Darling test statistic is then defined as
θ(AD) = ∑[i from 1 to k] ni integral[all x] ((\hatFi(x) - HN(x))^2) / (HN(x) (1 - HN(x))) dHN(x)
If the pooled ordered sample is Z1 < ... < ZN, the computational formula to evaluate θ(AD) is:
θ(AD) = 1/N ∑[i from 1 to k] 1/ni ∑[i from 1 to N-1] ((N M(ij) - j ni)^2) / (j(N-j))
where M(ij) is the number of observations in the i-th sample that are not greater than Zj. The homogeneity test can be carried out by comparing the obtained θ(AD) value to the tabulated percentage points reported by Scholz and Stephens (1987) for different significance levels.
The statistic θ(AD) depends on the sample values only through their ranks. This guarantees that the test statistic remains unchanged when the samples undergo monotonic transformations, an important stability property not possessed by HW heterogeneity measures. However, problems arise in applying this test in a common index value procedure. In fact, the index value procedure corresponds to dividing each site sample by a different value, thus modifying the ranks in the pooled sample. In particular, this has the effect of making the local empirical distribution functions much more similar to the other, providing an impression of homogeneity even when the samples are highly heterogeneous. The effect is analogous to that encountered when applying goodness-of-fit tests to distributions whose parameters are estimated from the same sample used for the test (e.g., D'Agostino and Stephens, 1986; Laio, 2004). In both cases, the percentage points for the test should be opportunely redetermined. This can be done with a nonparametric bootstrap approach presenting the following steps: build up the pooled sample S of the observed non-dimensional data. Sample with replacement from S and generate k artificial local samples, of size n1, ..., nk. Divide each sample for its index value, and calculate θ^(1)(AD). Repeat the procedure for Nsim times and obtain a sample of θ^(j)(AD), j = 1, ..., Nsim values, whose empirical distribution function can be used as an approximation of G(H0)(θ(AD)), the distribution of θ(AD) under the null hypothesis of homogeneity. The acceptance limits for the test, corresponding to any significance level α, are then easily determined as the quantiles of G(H0)(θ(AD)) corresponding to a probability (1-α).
We will call the test obtained with the above procedure the bootstrap Anderson-Darling test, hereafter referred to as AD.
Durbin and Knott test
The last considered homogeneity test derives from a goodness-of-fit statistic originally proposed by Durbin and Knott (1971). The test is formulated to measure discrepancies in the dispersion of the samples, without accounting for the possible presence of discrepancies in the mean or skewness of the data. Under this aspect, the test is similar to the HW1 test, while it is analogous to the AD test for the fact that it is a rank test. The original goodness-of-fit test is very simple: suppose to have a sample Xi, i = 1, ..., n, with hypothetical distribution F(x); under the null hypothesis the random variable F(Xi) has a uniform distribution in the (0,1) interval, and the statistic D = ∑[i from 1 to n] \cos(2 π F(Xi)) is approximately normally distributed with mean 0 and variance 1 (Durbin and Knott, 1971). D serves the purpose of detecting discrepancy in data dispersion: if the variance of Xi is greater than that of the hypothetical distribution F(x), D is significantly greater than 0, while D is significantly below 0 in the reverse case. Differences between the mean (or the median) of Xi and F(x) are instead not detected by D, which guarantees that the normalization by the index value does not affect the test.
The extension to homogeneity testing of the Durbin and Knott (DK) statistic is straightforward: we substitute the empirical distribution function obtained with the pooled observed data, HN(x), for F(x) in D, obtaining at each site a statistic
Di = ∑[j from 1 to ni] \cos(2 π HN(Xj))
which is normal under the hypothesis of homogeneity. The statistic θ(DK) = ∑[i from 1 to k] Di^2 has then a chi-squared distribution with k-1 degrees of freedom, which allows one to determine the acceptability limits for the test, corresponding to any significance level α.
Comparison among tests
The comparison (Viglione et al, 2007) shows that the Hosking and Wallis heterogeneity measure HW1 (only based on L-CV) is preferable when skewness is low, while the bootstrap Anderson-Darling test should be used for more skewed regions. As for HW2, the Hosking and Wallis heterogeneity measure based on L-CV and L-CA, it is shown once more how much it lacks power.
Our suggestion is to guide the choice of the test according to a compromise between power and Type I error of the HW1 and AD tests. The L-moment space is divided into two regions: if the t3^R coefficient for the region under analysis is lower than 0.23, we propose to use the Hosking and Wallis heterogeneity measure HW1; if t3^R > 0.23, the bootstrap Anderson-Darling test is preferable.
ADbootstrap.test
and DK.test
test gives its test statistic and its distribution value P.
If P is, for example, 0.92, samples shouldn't be considered heterogeneous with significance level minor of 8
HW.tests
gives the two Hosking and Wallis heterogeneity measures HW1 and HW2; following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if HW < 1, ‘possibly heterogeneous’ if 1 ≤ HW < 2, and ‘definitely heterogeneous’ if HW ≥ 2.
Alberto Viglione, e-mail: alviglio@tiscali.it.
D'Agostino R., Stephens M. (1986) Goodness-of-Fit Techniques, chapter Tests based on EDF statistics. Marcel Dekker, New York.
Durbin J., Knott M. (1971) Components of Cramer-von Mises statistics. London School of Economics and Political Science, pp. 290-307.
Hosking J., Wallis J. (1993) Some statistics useful in regional frequency analysis. Water Resources Research, 29 (2), pp. 271-281.
Hosking, J.R.M. and Wallis, J.R. (1997) Regional Frequency Analysis: an approach based on L-moments, Cambridge University Press, Cambridge, UK.
Laio, F., Cramer-von Mises and Anderson-Darling goodness of fit tests for extreme value distributions with unknown parameters, Water Resour. Res., 40, W09308, doi:10.1029/2004WR003204.
Scholz F., Stephens M. (1987) K-sample Anderson-Darling tests. Journal of American Statistical Association, 82 (399), pp. 918-924.
Viglione A., Laio F., Claps P. (2007) “A comparison of homogeneity tests for regional frequency analysis”, Water Resources Research, 43, W03428, doi:10.1029/2006WR005095.
Viglione A. (2007) Metodi statistici non-supervised per la stima di grandezze idrologiche in siti non strumentati, PhD thesis, Politecnico di Torino.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | data(annualflows)
annualflows[1:10,]
summary(annualflows)
x <- annualflows["dato"][,]
cod <- annualflows["cod"][,]
split(x,cod)
#ADbootstrap.test(x,cod,Nsim=100) # it takes some time
#HW.tests(x,cod) # it takes some time
DK.test(x,cod)
fac <- factor(annualflows["cod"][,],levels=c(34:38))
x2 <- annualflows[!is.na(fac),"dato"]
cod2 <- annualflows[!is.na(fac),"cod"]
split(x2,cod2)
sapply(split(x2,cod2),Lmoments)
regionalLmoments(x2,cod2)
ADbootstrap.test(x2,cod2)
ADbootstrap.test(x2,cod2,index=1)
HW.tests(x2,cod2)
DK.test(x2,cod2)
|
cod anno dato
1 1 1956 1494
2 1 1957 1309
3 1 1958 1699
4 1 1959 1467
5 1 1960 1918
6 1 1961 1469
7 1 1962 1267
8 1 1963 1523
9 1 1964 1338
10 1 1965 1438
cod anno dato
Min. : 1.0 Min. :1921 Min. : 172.0
1st Qu.:13.0 1st Qu.:1940 1st Qu.: 725.2
Median :22.0 Median :1951 Median : 981.0
Mean :23.7 Mean :1951 Mean :1041.4
3rd Qu.:34.0 3rd Qu.:1960 3rd Qu.:1308.8
Max. :49.0 Max. :1985 Max. :3045.0
$`1`
[1] 1494 1309 1699 1467 1918 1469 1267 1523 1338 1438 1788 1591 1697 1780 1769
$`2`
[1] 1144 1652 1807 1881 1741 1124 2064 1434 1678 1239 921 983 1093 1744 1213
[16] 1590 956 1124 2181 1077 1345 1219 988 1325 1277 1479 1307 2053 1232 973
[31] 1407 912
$`3`
[1] 2596 954 1115 1248 867 1280 1588 1055 1764 3045
$`4`
[1] 871 1238 1505 1636 1553 1936 1739 1867 1184 1630 1311 1520 1201 1614 1971
[16] 1829 1781 1093 1996 1328 1662 1199 860 961 949 1536 1016 1386 820 1023
[31] 2329 1209 1305 1334 1024 1364 1310 1410 1247 2393 1317 909 1808 1020 1181
[46] 1365 1218 1644 1160 1002 1243 1332 1033 1170 1685 1478 2434 1600 1369 1215
[61] 1614 1449 1518 1490 1191
$`7`
[1] 1481 1758 1774 1625 1607 2826 1488 928 2379 1173 1801 1824 1309 2220 1733
$`8`
[1] 1086 1810 2244 2138 2028 1308 1947 1528 2244 1594 861 1378 1795 1344 1558
[16] 696 724 2497 660 1388 1484 952 1987 2646 1689 1443 2688 1249 1145 2392
[31] 1001 1380
$`9`
[1] 2075 1607 1717 1261 1824 1330 963 1313 2276 682 1440 1304 1193
$`10`
[1] 1096 1387 1289 1461 1054 1474 1137 1256 981 1696 1468 1850 1644 1248 1498
[16] 1317 1500 1109 859 931 1020 1493 954 1133 1144 1056
$`11`
[1] 1320 1706 948 1643 944 1402 1202 1788 1665 1833 1679 1166 1833 1661 1938
[16] 1457 830 1221 1398 1674 1311 1611 1003 1021
$`12`
[1] 890 1247 1040 1047 875 1060 913 968 749 1218 1104 1489 1300 833 994
[16] 1002 1134 854 826 695 939 1230 830 1096 876 704 1111 780 791 709
[31] 812 686 812 755 802 1098 868 735 829 750 635 887 711 753 935
[46] 862 830 924 735 766 930 783 1623 1359 1015 922 963 848 975 760
[61] 766
$`13`
[1] 1288 854 1324 741 1043 756 1477 1160 1426 1360 1109 1211 1094 1666 1002
[16] 772 1124 997 649 1436 762 1293 930 721 838 1063 710 1002 1625 1002
[31] 848 1104 869 823 992 588 894 1073 675 1181 1568 817 1068 978
$`14`
[1] 1505 928 1223 805 1449 1084 1588 1509 1137 1014 1181 1394 922 811 1428
[16] 1137 1240 1034 581 1501 700 1263 962 780 919 1068 855 1198 1569 1134
[31] 1007 1205 973 871 1188 581 1027 1192 578 875 1553 774 958 1187 2152
[46] 836 834 753 1110
$`15`
[1] 969 811 1107 769 567 925 508 598 818 495
$`16`
[1] 957 625 625 658 1022 555 496 625 593 1115 718 957 707 332 821
[16] 469 913 663 418 523 799 469 1000 1104 761 598 1033 707 469 614
[31] 270 609 1017 367
$`17`
[1] 595 718 518 548 389 567 506 985 530 1097 934 675 614 587 722
[16] 499 459 1087 550 860 648 296 658
$`18`
[1] 686 863 488 937 453 621 484 851 599 1161 894 598 645 606 772
[16] 449 486 510 559 829 545 898 529 392 856 625 773 651 674 432
$`19`
[1] 589 715 479 696 394 533 430 845 519 1012 805 559 569 580 725
[16] 448 412 411 407 638 506 729 538 350 736 513 787
$`20`
[1] 1237 1908 1263 1066 1401 1263 1134 799 919 971 1057 1710 1555 1667 1212
[16] 799 1366 962 1779 1504 808 1031 1186 1031 1796 1882 1487 945 1710 1194
[31] 919 1418 722 1160 1409 894 1279 1884 1307
$`21`
[1] 489 704 310 665 259 501 428 820 551 994 658 425 423 409 736 440 401 398 342
[20] 658 449 665 535 247 584 338 580 569 311 412 565 403 846 917 525 411 717 526
[39] 248 451 185 356 564 256
$`22`
[1] 1197 863 1382 1104 649 745 615 1116 618 739 761 720 1147 838 1057
[16] 739 529 962 470 881 495 417 553 819 711 1410 1472 727 671 1163
[31] 751 476 819 399 612 860 507 844 1245 953 976
$`23`
[1] 835 1345 1085 1655 1291 838 974 862 1106 699 854 721 699 1033 892
[16] 1213 631 554 833 911 796 721 727
$`24`
[1] 1795 1761 1962 1541 1007 1276 1144 1302 947 1210 1113 1532 764 849 1412
[16] 1105 1048 843 1048 1157
$`25`
[1] 1498 880 1028 1046 589 1088 1179 1471 761 1106 2017 649 1129 1149 1355
[16] 1107
$`26`
[1] 1634 1300 1715 1643 1295 1459 1020 1531 919 1095 876 857 1534 1183 1405
[16] 1051 1159 1478 1472 1364 1140 1126 1007
$`27`
[1] 1157 1759 1245 842 1056 800 1244 806 925 839 782 1236 1601 886 768
[16] 1109 722 440
$`28`
[1] 1121 1488 1158 1287 1210 1468 1445 1304 1967 1408
$`29`
[1] 1121 1482 1163 1378 1201 1677 1360 2230 1117 1093 1647 1358
$`30`
[1] 395 342 463 649 400 703 388 570 292 490 440 885 671 1035 729
[16] 360 467 351 765 418 455 339 311 493 432 686 353 337 449 513
[31] 374 475 628 496 844 974 375 419 651 441 226 438 218 461 504
[46] 309 543 870 433 724 604 712 865 395 324 436 607 399
$`31`
[1] 754 1025 829 1428 1828 1472 771 1144 980 1728 720 850 995 901 1138
[16] 678 805 1509 616 629 716 848 767 720 1426 1370 1826 1046 1172 869
[31] 793 1008 571 1161
$`32`
[1] 920 1674 1153 1512 1226 647 945 822 1665 632 746 705 759 932 617
[16] 632 1259 506 590 743 598 747 988 855 1229 1461 458 804 867 652
[31] 580
$`33`
[1] 684 701 486 792 727 1086 564 624 1205 463 846 894 707 733 892
[16] 869 1283 1444 474 798 935 719 445 749 428 772 854 545 1002 939
[31] 643 603 785 775 1025 584
$`34`
[1] 636 998 1014 1965 1333 1730 1330 825 1112 851 1423 960 1031 976 561
[16] 1055 1076 1224 658 707 1453 445 966 930 939 862 1115 1158 1573
$`35`
[1] 845 803 746 1036 1160 1038 1285 369 1093 732 613 620 863 579 765
[16] 819 505 594 667 651 950 1583 688 622 1068
$`36`
[1] 924 1676 1765 841 796 745 1363 663 714 382 771 796 956 1153 669
[16] 796 1879 643 796 994 733 1185
$`37`
[1] 597 833 902 1207 793 598 1328 323 561 726 663 919 1139 1040 1264
[16] 1214
$`38`
[1] 492 608 368 393 1123 172 281 539 424 585 632 528
$`39`
[1] 339 929 560 727 490 684 979 1466 404 865 533 462 287 767 653
[16] 1176 1906 883
$`40`
[1] 755.00 871.00 938.00 1175.00 1218.00 621.00 432.25 913.20 840.15
[10] 827.97 919.29 724.48 602.72
$`41`
[1] 1449 1449 1546 1516 1254 1382
$`42`
[1] 895 1006 1351 1215 1215 1279 1006 1156 821
$`43`
[1] 948 1308 1185 801 848 926 932 755 764 891 677 835 1112 918 742
[16] 685 927
$`44`
[1] 1607 1275 1613 1484 1487 1205 1367 1158 1583 1342 1848 1640 1225 1320 1202
[16] 1476 1190 1435 894 1326 1230 1042 1127
$`45`
[1] 1953 1939 1677 1692 2051 2371 2022 1521 1448 1825 1363 1760 1672 1603 1244
[16] 1521 1783 1560 1357 1673 1625 1425 1688 1577 1736 1640 1584 1293 1277 1742
[31] 1491
$`46`
[1] 1223 1077 671 1063 969 842 1037 903 1407 1153 1107 1293 813 834 1118
[16] 901 981
$`47`
[1] 986 996 1335 964 1018 821 945 844 1133 975 1082 1252 1031 940 1078
[16] 933 709 923 899 747 1010 873 962 965 674 763 915 1029 1452 1486
$`48`
[1] 872 1528 1062 1345 1158 998 1197 1234 1469 1343 2103 1745 1084 1717 1131
[16] 990 1186 884 1118 1383 877 1072 1906 830
$`49`
[1] 808 1088 1435 1265 1065 911 992 1273 1031 1100 769 865 781 1019 1761
Ak P
307.7723 1.0000
$`34`
[1] 636 998 1014 1965 1333 1730 1330 825 1112 851 1423 960 1031 976 561
[16] 1055 1076 1224 658 707 1453 445 966 930 939 862 1115 1158 1573
$`35`
[1] 845 803 746 1036 1160 1038 1285 369 1093 732 613 620 863 579 765
[16] 819 505 594 667 651 950 1583 688 622 1068
$`36`
[1] 924 1676 1765 841 796 745 1363 663 714 382 771 796 956 1153 669
[16] 796 1879 643 796 994 733 1185
$`37`
[1] 597 833 902 1207 793 598 1328 323 561 726 663 919 1139 1040 1264
[16] 1214
$`38`
[1] 492 608 368 393 1123 172 281 539 424 585 632 528
34 35 36 37 38
l1 1065.7241379 827.7600000 965.4545455 881.68750000 512.0833333
l2 191.9729064 151.6000000 206.5800866 174.29583333 126.0681818
lcv 0.1801338 0.1831449 0.2139718 0.19768436 0.2461868
lca 0.1246570 0.1913101 0.3252284 -0.01093174 0.1775494
lkur 0.2105167 0.1536444 0.2173088 0.01341899 0.3616169
l1R l2R lcvR lcaR lkurR
895.1153846 175.0339202 0.1983372 0.1683511 0.1853942
A2kN P
2.641827 0.658000
A2kN P
1.933665 0.258000
H1 H2
-0.7677048 -0.4166196
Warning messages:
1: In fn(par, ...) : value out of range in 'gammafn'
2: In fn(par, ...) : value out of range in 'gammafn'
Ak P
14.1152348 0.9930638
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.