# HOMTESTS: Homogeneity tests In homtest: Homogeneity tests for Regional Frequency Analysis

## Description

Homogeneity tests for Regional Frequency Analysis.

## Usage

 1 2 3  ADbootstrap.test (x, cod, Nsim=500, index=2) HW.tests (x, cod, Nsim=500) DK.test (x, cod) 

## Arguments

 x vector representing data from many samples defined with cod cod array that defines the data subdivision among sites Nsim number of regions simulated with the bootstrap of the original region index if index=1 samples are divided by their average value; if index=2 (default) samples are divided by their median value

## Details

The Hosking and Wallis heterogeneity measures

The idea underlying Hosking and Wallis (1993) heterogeneity statistics is to measure the sample variability of the L-moment ratios and compare it to the variation that would be expected in a homogeneous region. The latter is estimated through repeated simulations of homogeneous regions with samples drawn from a four parameter kappa distribution (see e.g., Hosking and Wallis, 1997, pp. 202-204). More in detail, the steps are the following: with regards to the k samples belonging to the region under analysis, find the sample L-moment ratios (see, Hosking and Wallis, 1997) pertaining to the i-th site: these are the L-coefficient of variation (L-CV),

t^(i) = (1/ni ∑[j from 1 to ni](2(j - 1)/(ni - 1) - 1) Y(i,j)) / (1/ni ∑[j from 1 to ni] Y(i,j))

the coefficient of L-skewness,

t3^(i) = (1/ni ∑[j from 1 to ni](6(j-1)(j-2)/(ni-1)/(ni-2) - 6(j-1)/(ni-1) + 1) Y(i,j)) / (1/ni ∑[j from 1 to ni](2(j-1)/(ni-1) - 1) Y(i,j))

and the coefficient of L-kurtosis

t4^(i) = (1/ni ∑[j from 1 to ni](20(j-1)(j-2)(j-3)/(ni-1)/(ni-2)/(ni-3) - 30(j-1)(j-2)/(ni-1)/(ni-2) + 12(j-1)/(ni-1) - 1) Y(i,j)) / (1/ni ∑[j from 1 to ni](2(j-1)/(ni-1) - 1)Y(i,j))

Note that the L-moment ratios are not affected by the normalization by the index value, i.e. it is the same to use X(i,j) or Y(i,j) in Equations.

Define the regional averaged L-CV, L-skewness and L-kurtosis coefficients,

t^R = (∑[i from 1 to k] ni t^(i)) / (∑[i from 1 to k] ni)

t3^R = (∑[i from 1 to k] ni t3^(i)) / (∑[i from 1 to k] ni)

t4^R = (∑[i from 1 to k] ni t4^(i)) / (∑[i from 1 to k] ni)

and compute the statistic

V = {∑[i from 1 to k] ni (t^(i) - t^R)^2 / ∑[i from 1 to k] ni}^(1/2)

Fit the parameters of a four-parameters kappa distribution to the regional averaged L-moment ratios t^R, t3^R and t4^R, and then generate a large number Nsim of realizations of sets of k samples. The i-th site sample in each set has a kappa distribution as its parent and record length equal to ni. For each simulated homogeneous set, calculate the statistic V, obtaining Nsim values. On this vector of V values determine the mean μV and standard deviation σV that relate to the hypothesis of homogeneity (actually, under the composite hypothesis of homogeneity and kappa parent distribution).

An heterogeneity measure, which is called here HW1, is finally found as

θ(HW1) = (V - μV)/(σV)

θ(HW1) can be approximated by a normal distributed with zero mean and unit variance: following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if θ(HW1)<1, ‘possibly heterogeneous’ if 1 ≤ θ(HW1) < 2, and ‘definitely heterogeneous’ if θ(HW1) ≥ 2. Hosking and Wallis (1997) suggest that these limits should be treated as useful guidelines. Even if the θ(HW1) statistic is constructed like a significance test, significance levels obtained from such a test would in fact be accurate only under special assumptions: to have independent data both serially and between sites, and the true regional distribution being kappa.

Hosking and Wallis (1993) also give an alternative heterogeneity measure (that we call HW2), in which V is replaced by:

V2 = ∑[i from 1 to k] ni {(t^(i) - t^R)^2 + (t3^(i) - t3^R)^2}^(1/2) / ∑[i from 1 to k] ni

The test statistic in this case becomes

θ(HW2) = (V2 - μ(V2)) / (σ(V2))

with similar acceptability limits as the HW1 statistic. Hosking and Wallis (1997) judge θ(HW2) to be inferior to θ(HW1) and say that it rarely yields values larger than 2 even for grossly heterogeneous regions.

The bootstrap Anderson-Darling test

A test that does not make any assumption on the parent distribution is the Anderson-Darling (AD) rank test (Scholz and Stephens, 1987). The AD test is the generalization of the classical Anderson-Darling goodness of fit test (e.g., D'Agostino and Stephens, 1986), and it is used to test the hypothesis that k independent samples belong to the same population without specifying their common distribution function.

The test is based on the comparison between local and regional empirical distribution functions. The empirical distribution function, or sample distribution function, is defined by F(x) = j/η, x(j) ≤ x < x(j+1), where η is the size of the sample and x(j) are the order statistics, i.e. the observations arranged in ascending order. Denote the empirical distribution function of the i-th sample (local) by \hatFi(x), and that of the pooled sample of all N = n1 + ... + nk observations (regional) by HN(x). The k-sample Anderson-Darling test statistic is then defined as

θ(AD) = ∑[i from 1 to k] ni integral[all x] ((\hatFi(x) - HN(x))^2) / (HN(x) (1 - HN(x))) dHN(x)

If the pooled ordered sample is Z1 < ... < ZN, the computational formula to evaluate θ(AD) is:

θ(AD) = 1/N ∑[i from 1 to k] 1/ni ∑[i from 1 to N-1] ((N M(ij) - j ni)^2) / (j(N-j))

where M(ij) is the number of observations in the i-th sample that are not greater than Zj. The homogeneity test can be carried out by comparing the obtained θ(AD) value to the tabulated percentage points reported by Scholz and Stephens (1987) for different significance levels.

The statistic θ(AD) depends on the sample values only through their ranks. This guarantees that the test statistic remains unchanged when the samples undergo monotonic transformations, an important stability property not possessed by HW heterogeneity measures. However, problems arise in applying this test in a common index value procedure. In fact, the index value procedure corresponds to dividing each site sample by a different value, thus modifying the ranks in the pooled sample. In particular, this has the effect of making the local empirical distribution functions much more similar to the other, providing an impression of homogeneity even when the samples are highly heterogeneous. The effect is analogous to that encountered when applying goodness-of-fit tests to distributions whose parameters are estimated from the same sample used for the test (e.g., D'Agostino and Stephens, 1986; Laio, 2004). In both cases, the percentage points for the test should be opportunely redetermined. This can be done with a nonparametric bootstrap approach presenting the following steps: build up the pooled sample S of the observed non-dimensional data. Sample with replacement from S and generate k artificial local samples, of size n1, ..., nk. Divide each sample for its index value, and calculate θ^(1)(AD). Repeat the procedure for Nsim times and obtain a sample of θ^(j)(AD), j = 1, ..., Nsim values, whose empirical distribution function can be used as an approximation of G(H0)(θ(AD)), the distribution of θ(AD) under the null hypothesis of homogeneity. The acceptance limits for the test, corresponding to any significance level α, are then easily determined as the quantiles of G(H0)(θ(AD)) corresponding to a probability (1-α).

We will call the test obtained with the above procedure the bootstrap Anderson-Darling test, hereafter referred to as AD.

Durbin and Knott test

The last considered homogeneity test derives from a goodness-of-fit statistic originally proposed by Durbin and Knott (1971). The test is formulated to measure discrepancies in the dispersion of the samples, without accounting for the possible presence of discrepancies in the mean or skewness of the data. Under this aspect, the test is similar to the HW1 test, while it is analogous to the AD test for the fact that it is a rank test. The original goodness-of-fit test is very simple: suppose to have a sample Xi, i = 1, ..., n, with hypothetical distribution F(x); under the null hypothesis the random variable F(Xi) has a uniform distribution in the (0,1) interval, and the statistic D = ∑[i from 1 to n] \cos(2 π F(Xi)) is approximately normally distributed with mean 0 and variance 1 (Durbin and Knott, 1971). D serves the purpose of detecting discrepancy in data dispersion: if the variance of Xi is greater than that of the hypothetical distribution F(x), D is significantly greater than 0, while D is significantly below 0 in the reverse case. Differences between the mean (or the median) of Xi and F(x) are instead not detected by D, which guarantees that the normalization by the index value does not affect the test.

The extension to homogeneity testing of the Durbin and Knott (DK) statistic is straightforward: we substitute the empirical distribution function obtained with the pooled observed data, HN(x), for F(x) in D, obtaining at each site a statistic

Di = ∑[j from 1 to ni] \cos(2 π HN(Xj))

which is normal under the hypothesis of homogeneity. The statistic θ(DK) = ∑[i from 1 to k] Di^2 has then a chi-squared distribution with k-1 degrees of freedom, which allows one to determine the acceptability limits for the test, corresponding to any significance level α.

Comparison among tests

The comparison (Viglione et al, 2007) shows that the Hosking and Wallis heterogeneity measure HW1 (only based on L-CV) is preferable when skewness is low, while the bootstrap Anderson-Darling test should be used for more skewed regions. As for HW2, the Hosking and Wallis heterogeneity measure based on L-CV and L-CA, it is shown once more how much it lacks power.

Our suggestion is to guide the choice of the test according to a compromise between power and Type I error of the HW1 and AD tests. The L-moment space is divided into two regions: if the t3^R coefficient for the region under analysis is lower than 0.23, we propose to use the Hosking and Wallis heterogeneity measure HW1; if t3^R > 0.23, the bootstrap Anderson-Darling test is preferable.

## Value

ADbootstrap.test and DK.test test gives its test statistic and its distribution value P. If P is, for example, 0.92, samples shouldn't be considered heterogeneous with significance level minor of 8

HW.tests gives the two Hosking and Wallis heterogeneity measures HW1 and HW2; following Hosking and Wallis (1997), the region under analysis can therefore be regarded as ‘acceptably homogeneous’ if HW < 1, ‘possibly heterogeneous’ if 1 ≤ HW < 2, and ‘definitely heterogeneous’ if HW ≥ 2.

## Author(s)

Alberto Viglione, e-mail: alviglio@tiscali.it.

## References

D'Agostino R., Stephens M. (1986) Goodness-of-Fit Techniques, chapter Tests based on EDF statistics. Marcel Dekker, New York.

Durbin J., Knott M. (1971) Components of Cramer-von Mises statistics. London School of Economics and Political Science, pp. 290-307.

Hosking J., Wallis J. (1993) Some statistics useful in regional frequency analysis. Water Resources Research, 29 (2), pp. 271-281.

Hosking, J.R.M. and Wallis, J.R. (1997) Regional Frequency Analysis: an approach based on L-moments, Cambridge University Press, Cambridge, UK.

Laio, F., Cramer-von Mises and Anderson-Darling goodness of fit tests for extreme value distributions with unknown parameters, Water Resour. Res., 40, W09308, doi:10.1029/2004WR003204.

Scholz F., Stephens M. (1987) K-sample Anderson-Darling tests. Journal of American Statistical Association, 82 (399), pp. 918-924.

Viglione A., Laio F., Claps P. (2007) “A comparison of homogeneity tests for regional frequency analysis”, Water Resources Research, 43, W03428, doi:10.1029/2006WR005095.

Viglione A. (2007) Metodi statistici non-supervised per la stima di grandezze idrologiche in siti non strumentati, PhD thesis, Politecnico di Torino.

KAPPA, Lmoments.

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 data(annualflows) annualflows[1:10,] summary(annualflows) x <- annualflows["dato"][,] cod <- annualflows["cod"][,] split(x,cod) #ADbootstrap.test(x,cod,Nsim=100) # it takes some time #HW.tests(x,cod) # it takes some time DK.test(x,cod) fac <- factor(annualflows["cod"][,],levels=c(34:38)) x2 <- annualflows[!is.na(fac),"dato"] cod2 <- annualflows[!is.na(fac),"cod"] split(x2,cod2) sapply(split(x2,cod2),Lmoments) regionalLmoments(x2,cod2) ADbootstrap.test(x2,cod2) ADbootstrap.test(x2,cod2,index=1) HW.tests(x2,cod2) DK.test(x2,cod2) 

### Example output

   cod anno dato
1    1 1956 1494
2    1 1957 1309
3    1 1958 1699
4    1 1959 1467
5    1 1960 1918
6    1 1961 1469
7    1 1962 1267
8    1 1963 1523
9    1 1964 1338
10   1 1965 1438
cod            anno           dato
Min.   : 1.0   Min.   :1921   Min.   : 172.0
1st Qu.:13.0   1st Qu.:1940   1st Qu.: 725.2
Median :22.0   Median :1951   Median : 981.0
Mean   :23.7   Mean   :1951   Mean   :1041.4
3rd Qu.:34.0   3rd Qu.:1960   3rd Qu.:1308.8
Max.   :49.0   Max.   :1985   Max.   :3045.0
$1  1494 1309 1699 1467 1918 1469 1267 1523 1338 1438 1788 1591 1697 1780 1769$2
 1144 1652 1807 1881 1741 1124 2064 1434 1678 1239  921  983 1093 1744 1213
 1590  956 1124 2181 1077 1345 1219  988 1325 1277 1479 1307 2053 1232  973
 1407  912

$3  2596 954 1115 1248 867 1280 1588 1055 1764 3045$4
  871 1238 1505 1636 1553 1936 1739 1867 1184 1630 1311 1520 1201 1614 1971
 1829 1781 1093 1996 1328 1662 1199  860  961  949 1536 1016 1386  820 1023
 2329 1209 1305 1334 1024 1364 1310 1410 1247 2393 1317  909 1808 1020 1181
 1365 1218 1644 1160 1002 1243 1332 1033 1170 1685 1478 2434 1600 1369 1215
 1614 1449 1518 1490 1191

$7  1481 1758 1774 1625 1607 2826 1488 928 2379 1173 1801 1824 1309 2220 1733$8
 1086 1810 2244 2138 2028 1308 1947 1528 2244 1594  861 1378 1795 1344 1558
  696  724 2497  660 1388 1484  952 1987 2646 1689 1443 2688 1249 1145 2392
 1001 1380

$9  2075 1607 1717 1261 1824 1330 963 1313 2276 682 1440 1304 1193$10
 1096 1387 1289 1461 1054 1474 1137 1256  981 1696 1468 1850 1644 1248 1498
 1317 1500 1109  859  931 1020 1493  954 1133 1144 1056

$11  1320 1706 948 1643 944 1402 1202 1788 1665 1833 1679 1166 1833 1661 1938  1457 830 1221 1398 1674 1311 1611 1003 1021$12
  890 1247 1040 1047  875 1060  913  968  749 1218 1104 1489 1300  833  994
 1002 1134  854  826  695  939 1230  830 1096  876  704 1111  780  791  709
  812  686  812  755  802 1098  868  735  829  750  635  887  711  753  935
  862  830  924  735  766  930  783 1623 1359 1015  922  963  848  975  760
  766

$13  1288 854 1324 741 1043 756 1477 1160 1426 1360 1109 1211 1094 1666 1002  772 1124 997 649 1436 762 1293 930 721 838 1063 710 1002 1625 1002  848 1104 869 823 992 588 894 1073 675 1181 1568 817 1068 978$14
 1505  928 1223  805 1449 1084 1588 1509 1137 1014 1181 1394  922  811 1428
 1137 1240 1034  581 1501  700 1263  962  780  919 1068  855 1198 1569 1134
 1007 1205  973  871 1188  581 1027 1192  578  875 1553  774  958 1187 2152
  836  834  753 1110

$15  969 811 1107 769 567 925 508 598 818 495$16
  957  625  625  658 1022  555  496  625  593 1115  718  957  707  332  821
  469  913  663  418  523  799  469 1000 1104  761  598 1033  707  469  614
  270  609 1017  367

$17  595 718 518 548 389 567 506 985 530 1097 934 675 614 587 722  499 459 1087 550 860 648 296 658$18
  686  863  488  937  453  621  484  851  599 1161  894  598  645  606  772
  449  486  510  559  829  545  898  529  392  856  625  773  651  674  432

$19  589 715 479 696 394 533 430 845 519 1012 805 559 569 580 725  448 412 411 407 638 506 729 538 350 736 513 787$20
 1237 1908 1263 1066 1401 1263 1134  799  919  971 1057 1710 1555 1667 1212
  799 1366  962 1779 1504  808 1031 1186 1031 1796 1882 1487  945 1710 1194
  919 1418  722 1160 1409  894 1279 1884 1307

$21  489 704 310 665 259 501 428 820 551 994 658 425 423 409 736 440 401 398 342  658 449 665 535 247 584 338 580 569 311 412 565 403 846 917 525 411 717 526  248 451 185 356 564 256$22
 1197  863 1382 1104  649  745  615 1116  618  739  761  720 1147  838 1057
  739  529  962  470  881  495  417  553  819  711 1410 1472  727  671 1163
  751  476  819  399  612  860  507  844 1245  953  976

$23  835 1345 1085 1655 1291 838 974 862 1106 699 854 721 699 1033 892  1213 631 554 833 911 796 721 727$24
 1795 1761 1962 1541 1007 1276 1144 1302  947 1210 1113 1532  764  849 1412
 1105 1048  843 1048 1157

$25  1498 880 1028 1046 589 1088 1179 1471 761 1106 2017 649 1129 1149 1355  1107$26
 1634 1300 1715 1643 1295 1459 1020 1531  919 1095  876  857 1534 1183 1405
 1051 1159 1478 1472 1364 1140 1126 1007

$27  1157 1759 1245 842 1056 800 1244 806 925 839 782 1236 1601 886 768  1109 722 440$28
 1121 1488 1158 1287 1210 1468 1445 1304 1967 1408

$29  1121 1482 1163 1378 1201 1677 1360 2230 1117 1093 1647 1358$30
  395  342  463  649  400  703  388  570  292  490  440  885  671 1035  729
  360  467  351  765  418  455  339  311  493  432  686  353  337  449  513
  374  475  628  496  844  974  375  419  651  441  226  438  218  461  504
  309  543  870  433  724  604  712  865  395  324  436  607  399

$31  754 1025 829 1428 1828 1472 771 1144 980 1728 720 850 995 901 1138  678 805 1509 616 629 716 848 767 720 1426 1370 1826 1046 1172 869  793 1008 571 1161$32
  920 1674 1153 1512 1226  647  945  822 1665  632  746  705  759  932  617
  632 1259  506  590  743  598  747  988  855 1229 1461  458  804  867  652
  580

$33  684 701 486 792 727 1086 564 624 1205 463 846 894 707 733 892  869 1283 1444 474 798 935 719 445 749 428 772 854 545 1002 939  643 603 785 775 1025 584$34
  636  998 1014 1965 1333 1730 1330  825 1112  851 1423  960 1031  976  561
 1055 1076 1224  658  707 1453  445  966  930  939  862 1115 1158 1573

$35  845 803 746 1036 1160 1038 1285 369 1093 732 613 620 863 579 765  819 505 594 667 651 950 1583 688 622 1068$36
  924 1676 1765  841  796  745 1363  663  714  382  771  796  956 1153  669
  796 1879  643  796  994  733 1185

$37  597 833 902 1207 793 598 1328 323 561 726 663 919 1139 1040 1264  1214$38
  492  608  368  393 1123  172  281  539  424  585  632  528

$39  339 929 560 727 490 684 979 1466 404 865 533 462 287 767 653  1176 1906 883$40
  755.00  871.00  938.00 1175.00 1218.00  621.00  432.25  913.20  840.15
  827.97  919.29  724.48  602.72

$41  1449 1449 1546 1516 1254 1382$42
  895 1006 1351 1215 1215 1279 1006 1156  821

$43  948 1308 1185 801 848 926 932 755 764 891 677 835 1112 918 742  685 927$44
 1607 1275 1613 1484 1487 1205 1367 1158 1583 1342 1848 1640 1225 1320 1202
 1476 1190 1435  894 1326 1230 1042 1127

$45  1953 1939 1677 1692 2051 2371 2022 1521 1448 1825 1363 1760 1672 1603 1244  1521 1783 1560 1357 1673 1625 1425 1688 1577 1736 1640 1584 1293 1277 1742  1491$46
 1223 1077  671 1063  969  842 1037  903 1407 1153 1107 1293  813  834 1118
  901  981

$47  986 996 1335 964 1018 821 945 844 1133 975 1082 1252 1031 940 1078  933 709 923 899 747 1010 873 962 965 674 763 915 1029 1452 1486$48
  872 1528 1062 1345 1158  998 1197 1234 1469 1343 2103 1745 1084 1717 1131
  990 1186  884 1118 1383  877 1072 1906  830

$49  808 1088 1435 1265 1065 911 992 1273 1031 1100 769 865 781 1019 1761 Ak P 307.7723 1.0000$34
  636  998 1014 1965 1333 1730 1330  825 1112  851 1423  960 1031  976  561
 1055 1076 1224  658  707 1453  445  966  930  939  862 1115 1158 1573

$35  845 803 746 1036 1160 1038 1285 369 1093 732 613 620 863 579 765  819 505 594 667 651 950 1583 688 622 1068$36
  924 1676 1765  841  796  745 1363  663  714  382  771  796  956 1153  669
  796 1879  643  796  994  733 1185

$37  597 833 902 1207 793 598 1328 323 561 726 663 919 1139 1040 1264  1214$38
  492  608  368  393 1123  172  281  539  424  585  632  528

34          35          36           37          38
l1   1065.7241379 827.7600000 965.4545455 881.68750000 512.0833333
l2    191.9729064 151.6000000 206.5800866 174.29583333 126.0681818
lcv     0.1801338   0.1831449   0.2139718   0.19768436   0.2461868
lca     0.1246570   0.1913101   0.3252284  -0.01093174   0.1775494
lkur    0.2105167   0.1536444   0.2173088   0.01341899   0.3616169
l1R         l2R        lcvR        lcaR       lkurR
895.1153846 175.0339202   0.1983372   0.1683511   0.1853942
A2kN        P
2.641827 0.658000
A2kN        P
1.933665 0.258000
H1         H2
-0.7677048 -0.4166196
Warning messages:
1: In fn(par, ...) : value out of range in 'gammafn'
2: In fn(par, ...) : value out of range in 'gammafn'
Ak          P
14.1152348  0.9930638


homtest documentation built on May 2, 2019, 1:45 p.m.