akqdecay: Asquith-Knight Discharge Decay Analyses

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Compute statistics associated with the today-to-tomorrow or N-day-over-day base-10 logarithm decline of daily streamflows. The preferred expression for this package is the inverse of the base-10 logarithm per day to form the days per unit log-cycle decline in streamflow.

The critical mathematical operation to understand for the akqdecay package is the differencing, and the entire akqdecay package is oriented around this single line of code. Letting logQ be the base-10 logarithms of streamflow for which streamflows of zero are treated as NA, the differences are computed using the built-in R diff() function:

1
  delQ <- diff(logQ, lag=lag, differences=1)/lag

where lag=1 implies a day (“today”) to the following day (“tomorrow”) difference, differences=1 represents first-order differentiation, and the division by the lag standardizes the output to a log10-cycle change per day. The first example in Examples demonstrates this (default) computation. The vector of delQ will become shortened in length relative to logQ by the count lag—thus, shortened by one value (one index of the vector) when the default differences and lag are used.

A setting of differences greater than 1 is applied recursively and is difficult to visualize, and gaps in record might not be properly handled. The differences is retained mostly for experimental reasons. A setting of lag greater than 1 means that skip overs are involved. The difference in number of days between indices of the flow vector are consulted in their relation to lag to handle unpadded gaps in record. This means that the diff() call on the logarithms of flow is mimicked by a diff() call on the dates, and inappropriate gaps in record not equal to the lag have the difference of the logarithms changed to NA. As a result, the total of the number of NAs encountered might not reflect number of zero-flow days (in context of settings of differences and lag). The handling of zero-flow days currently is an open research question for this package.

The akqdecay package inverts delQ prior to any statistical processing. Let us refer to 1/delQ as Ψ(l,d) where l is lag and d is differences. For brevity however, days per log-cycle may simply be denoted as Ψ. The inversion has the effect of changing the unit system to units of time per log-cycle change. The reasoning is that Ψ is in readily interpreted units of time. Asquith and Knight suggest that units of time provide a straightforward unit system to describe these extension computations to broad audiences (e.g. nontechnical stakeholders).

Further, the choice of days per unit log10-cycle change (decline) was chosen because Ψ values are in the same units as the persistent recession slope previously studied in Alabama (Bingham, 1982) and then again in Tennessee (Bingham, 1986). That slope was referred to as the “G_b-factor” (the “Bingham geologic factor”). Here in the akqdecay package, the gfactor (parametrical) or gfactor_emp (empirical) are used as monikers under which a specific definition of such a “factor” is entwined with a probability level. The association of Ψ-like values to probability was not a subject of Bingham's studies.

Usage

1
2
3
4
akqdecay(akdvtable, as.list=TRUE, as.ra7=FALSE, f=0.90,
                    method=c("decreasing", "increasing", "nochange"),
                    lag=1, differences=1, site="", type="gpa",
                    notable=FALSE, setgfna=FALSE, ...)

Arguments

akdvtable

A USGS daily-value of streamflow table with some extensions unique to this package as returned by dvget;

as.list

Return an extensive list of computed tables of statistics;

as.ra7

Compute the “RA7” statistic (like the ra7() function of the EflowStats package by the USGS that is not a dependency of the akqdecay package). For numerical comparison, it is critically important that as.ra7 returns the median natural logarithm, where as all other meaning of logarithms in the akqdecay package are base10 to be consistent with Bingham's earlier work. Also, this option, if set, trumps, the lag and differences values and assigns unity to each. This argument returns only the median of the decreasing or increasing days and the no change is by definition zero. As a final note, the EflowStats package substitutes 0.01 cubic feet per second for zero flow days—the akqdecay package tracks these as missing for purposes of computation, and thus in presence of zero flows, numerical differences between the two “RA7” methods would result;

f

A nonexceedance probability (F \in [0,1]) on which to compute a quantile of the recession or rise estimate (see Note labeled as gfactor and gfactor_emp), and the default is the 90th percentile;

method

Categories controlling the direction of “decay” computation. The default is to look at statistics of the recession (decreasing to acquire \checkΨ). But rising (increasing to acquire \hatΨ) limbs of the hydrograph or those periods without change (nochange to acquire \ddotΨ) can be computed. To avoid user concern, users are alerted to not be concerned with the tables (many NA values) produced with nochange because logarithms are involved and day-over-day with no changes are thus degenerate—the various subtables (data frames) produced by the akqdecay function are thus mostly unpopulated for nochange;

lag

The argument of the same name for the diff() function that is called internally;

differences

The argument of the same name for the diff() function that is called internally;

site

An optional streamgage identification number or other character string that will be used in replication in subordinate tables generated by akqdecay. It is deliberate that this function does not pickup the site_no from the daily-value table akdvtable. This permits the user to potentially intercept the site number or name and replace with their own for flexibility in categorical statistical analyses in their own custom scripts;

type

Distribution type of the lmomco package to estimate the Asquith–Knight
G_f(F;Θ)-factor for the nonexceedance probability (F) passed by f and the distribution parameters in Θ. If the type=NA, then the computation of the fitted distribution is short circuited and this is useful for speed during testing (for instance);

notable

If set, then the table element of the returned list when as.list is set is simply set to a table (data frame) with the site and "not_requested". This is done mostly as a means to shorten the returned list and functionally this option has compatibility with akq_table. However, recomputation (gfredo) of empirical Gfactors would not be possible if notable is set. This occurs because access to the underlying sample of Ψ has been lost—the sample thrown away. Keep this in mind that perhaps notable is mostly an option for the developers;

setgfna

This option is the same as for gfredo, meaning if setgfna is set, then type is internally set to NA; and

...

Additional arguments to pass to control lmrdf2gfactor.

Details

The time units are important for further discernment because of the influence of lag. If the weekly change were computed using lag=7, the delQ would be interpretable as the “log10-cycle change per day based on a weekly-based time difference.” Thus, Ψ(7,1) for lag=7 would be the weeks for a log-cycle change.

1
2
3
4
5
  logQ <- log10(c(130,122,121,100,95,93,92,91)) # 8 declining flow days
  median(1/abs(diff(logQ, lag=1)))     # days per log-cycle change [7 values]
  [1] 73.67673 # median days based on one  day of lag
  median(1/abs(diff(logQ, lag=7)) * 7) # days per log-cycle change [1 value ]
  [1] 45.18987 # median days based on one week of lag

What does days per log-cycle mean? Some users are anticipated to be less comfortable with the idea of log-cycle than others. A log-cycle, as used herein for base-10 logarithms, is a power of ten or order of magnitude change. For example and for a hypothetical streamgage, if the median Ψ or Ψ_{\mathrm{med}} computed from declining days over many decades was 73 days, then one might infer (statistically) that today's flow will drop by a power of ten in about 2.5 months. Suppose then that another nearby streamgage with similar climate and watershed size had Ψ_{\mathrm{med}} = 34 days for a similar time period. The analyst might then conclude that different hydrologic processes exist between the two watershed to cause the first watershed to have about 1.5 months longer for streamflow to drop by a power of ten.

How might one interpret Ψ? There are very many subtleties when interpreting statistics of Ψ, and hence the motivation for the akqdecay package to compute an immense array of statistics for later evaluation. But in general for method="decreasing" and the other defaults, small values of Ψ represent the collapse of streamflow following rainfall-runoff (stormflow) events and reside in the left-tail of the distribution of Ψ, whereas large values of Ψ reside in the right-tail of the distribution and inherently are mostly associated with baseflow stability and hence potential groundwater and surface water interactions.

Value

Various types of tables and statistics can be returned. If as.ra7, then the median of the declines or inclines (decreasing=FALSE) is returned—just a simple single value. If as.list=FALSE, then the time series of decreasing or increasing or nochange as days per log-cycle is returned. If as.list, then the following are returned:

table

An R data.frame containing columns for the site identifier, water year, calendar year, month, decade, date (forward bias, “tomorrow's” streamflow), probability on the flow-duration curve (full period of record used), the respective daily flow for that day is returned (forward bias), and days per unit log-cycle change (decay) of N-day-over-day increments;

counts

A named R vector containing the total number of daily values (same as
total_count discussed with summary below), the number of decreases, the number of increases, the number of no changes (nochanges), and number of missing values (NAs). The meaning of nochanges is retricted to the N-day-over-day changes after logarithmic transformation whose differences are zero; this means that no changes in streamflow from zero flow to zero flow are not counted in this alogrithm;

summary

An R data.frame with a summary of the period of record results absent the L-moments: total_count (inclusive of the unchanging increments in streamflow), count, kendall_tau and spearman_rho, median, L1L2, gfactor, and gfactor_emp;

lmoments

An R list of data frames of the L-moments for the period-of-record (por), L-moments by_year (without regard to whether a given calendar year had the streamgage active for the whole year), including the median and sample size (count of decreases [or increases]), and L-moments by_decade (without regard to completeness [is truly a full decade of record available]); and

ifail

A number code indicate the failure mode (0, no problems encountered; 1, no values for any processing; 2, no values after NA check on the days_per_log [the Ψ values].

Note

The authors (Asquith and Knight) suggest that particular quantiles of the distribution of Ψ might be more germane than others for statistical regionalization studies. In particular, far into the right tail of the distribution of Ψ is of primary interest. Though the sample median is computed by akqdecay, a setting f=0.5 would be the median of a fitted three-parameter (3p) generalized Pareto distribution (GPA), if type="gpa", fit to the data, and this “50th percentile” is thus emplaced as a gfactor. The choice of the GPA is the default; though this choice might not be optimal. The mathematics of the GPA are listed under lmrdf2gfactor. The choice for the GPA is based on experimental processing of more than 1,600 streamgages and review of the distribution of L-skew (τ_3) and L-kurtosis (τ_4) plotted on an L-moment ratio diagram (Asquith, 2011a,b).

A nonparametric estimate for the f is emplaced in gfactor_emp. The default of f=0.90 is thus the 90th percentile. Optimal choice of f, should it materially exist as a constant or nearly so, is an open research problem.

Correlation between “Tomorrow's” Streamflow and Ψ: Both Kendall Tau and Spearman Rho correlation coefficients (Helsel and Hirsch, 2002) are computed between “tomorrow's” streamflow (units of cubic feet per second) and Ψ (units of days). These correlations are stored in the summary as explained above. The correlations are computed before the absolute value of Ψ is made—recall that these are always positive on return from this function with respect to the method choice of decreasing or increasing. A negative correlation coefficient, thus, means that Ψ decreases as streamflow increases.

Attributes: The attributes of “method,” “lag,” “differences,” “probability,” and “distribution_type” for the function arguments method, lag, differences, probability, and distribution_type are set by the built-in R attr() function unless as.ra7=TRUE. This permits the user to probe the returned value for these settings if ever desired. If as.list=FALSE, then only the attributes of “method,” “lag,” and “differences” are set.

Parametric Gfactor Computations: This package is intended for large data mining computations. To this end, a special trap for incomplete L-moments is made for protection against failure within the lmomco package. If the period-of-record L-moments do not test as valid by two checks. First, the L-moments must be valid through the logic of lmomco::are.lmom.valid(). Second, if either L-skew (τ_3) or L-kurtosis (τ_4) is NA, then the Gfactor is computed as NA. (This is mentioned because by design, it is easy to leak missing L-skew into lmomco's distribution fitting and breakage occurs there and not within akqdecay. The function lmrdf2gfactor does this protection for yearly and decadal statistics.)

Author(s)

W.H. Asquith

References

Asquith, W.H., 2011a, Distributional analysis with L-moment statistics using the R environment for statistical computing: Ph.D. dissertation, Texas Tech University.

Asquith, W.H., 2011b, Distributional analysis with L-moment statistics using the R environment for statistical computing: CreateSpace, [print-on-demand], ISBN 978–146350841–8, https://www.amazon.com/dp/1463508417. [reprinting of Asquith (2011a) with errata]

Asquith, W.H., 2018, lmomco—L-moments, trimmed L-moments, L-comoments, censored
L-moments, and many distributions: R package version 2.3.2 at https://cran.r-project.org/package=lmomco.

Bingham, R.H., 1982, Low-flow characteristics of Alabama streams, USGS Water Supply Paper 2083, 27 p., https://doi.org/10.3133/wsp2083.

Bingham, R.H., 1986, Regionalization of low-flow characteristics of Tennessee streams: U.S. Geological Survey Water-Resources Investigations Report 85–4191, 63 p., https://doi.org/10.3133/wri854191.

Helsel, D.R. and Hirsch, R.M., 2002, Statistical Methods in Water Resources: U.S. Geological Survey Techniques of Water Resources Investigations, book 4, chapter A3, 522 p., https://doi.org/10.3133/twri04A3.

See Also

dvget, fill_akqenv

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# USGS 07030392 Wolf River at Lagrange, Tennessee
DVs <- dvget("07030392", sdate="1995-10-01", edate="1995-12-31")
head(DVs) # We will look at the first decreasing day change.
#  agency_cd  site_no       Date Flow Flow_cd     site year month decade wyear
#4      USGS 07030392 1995-10-04   79       A 07030392 1995    10   1990  1996
#5      USGS 07030392 1995-10-05   78       A 07030392 1995    10   1990  1996
# Let us define "tomorrow" as Oct5 and "today" as Oct4. LAG = unity (1 day)
#   (log10(78) - log10(79))/LAG = -5.532E-3 --> abs(-5.532E-3) = 5.532E-3
#   and then invert to days per log-cycle change --> 1/5.532E-3 --> 180.75 days
AK <- akqdecay(DVs) # Perhaps Asquith--Knight discharge decay analyses
head(AK$table) # We can see the foundational computations:
#  site wyear year month decade       date        fdc fqc days_per_log
#        1996 1995    10   1990 1995-10-05 0.18279570  78    180.75049
# (The last column for pp_days_per_log is not shown for brevity.)
# Notice that a AK$table will shorten by one row because
#  log10(tomorrow) - log10(today) reduces the length by the LAG.
# Thus, we can only track one of the two flows entering into the computation
# Let us retain a "forward bias" and report "tomorrow," hence AK$table
# shows 78 cfs as the first row and retains it on the actual date 10/05/1995.


# Continuing the above example, let us look at then a statistic of
# the 52 declining days. Let us look first at the general computations
head(AK$lmoments$por) # The period of record statistics are here.
# site yr_range_str count  median     L1L2  gfactor gfactor_emp
#         1995--1995    52 51.4158 162.5754 194.3716    180.7505
#       L1       L2        T3      T4         T5         T6
# 83.71642 44.49143 0.3981854 0.17382 0.07210696 0.02502416
# We see the median declining days for one log10-cycle decline is about 51.4, and
# the sample size is 52. The 90th percentile (default) of the empirical
# distribution of the values is about 180.8 days. Finally, we can compare the
# median here to the "RA7" statistic.
RA7 <- akqdecay(DVs, as.ra7=TRUE)
print(RA7)
# natural_logarithm_median_decreasing
#                         0.04484609
names(RA7) <- NULL
1/log10(exp(RA7))
# [1] 51.34416 which approximately matches the 51.4 reported previously.


# USGS 08167000 Guadalupe River at Comfort, Texas
site <- "08167000"; my.data <- dvget("08167000")
recession_limb <- akqdecay(my.data, as.list=TRUE, method="decreasing", site=site)
   rising_limb <- akqdecay(my.data, as.list=TRUE, method="increasing", site=site)
   stable_limb <- akqdecay(my.data, as.list=TRUE, method="nochange",   site=site)
boxplot(days_per_log~year,   log="y", data=recession_limb$table)
boxplot(days_per_log~decade, log="y", data=   rising_limb$table,
        ylim=c(1,2000)); mtext("RISING LIMB")
boxplot(days_per_log~decade, log="y", data=recession_limb$table,
        ylim=c(1,2000)); mtext("RECESSION LIMB")
plot(count~year,   log="y", data=stable_limb$lmoments$by_year  )
plot(count~decade, log="y", data=stable_limb$lmoments$by_decade) #

wasquith-usgs/akqdecay documentation built on Nov. 9, 2020, 1:13 p.m.