runningquantiles: Compute approximate quantiles over a sliding window

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Computes cumulants up to some given order, then employs the Cornish-Fisher approximation to compute approximate quantiles using a Gaussian basis.

Usage

1
2
3
4
5
6
7
running_apx_quantiles(v, p, window = NULL, wts = NULL, max_order = 5L,
  na_rm = FALSE, min_df = 0L, used_df = 0, restart_period = 100L,
  check_wts = FALSE, normalize_wts = TRUE)

running_apx_median(v, window = NULL, wts = NULL, max_order = 5L,
  na_rm = FALSE, min_df = 0L, used_df = 0, restart_period = 100L,
  check_wts = FALSE, normalize_wts = TRUE)

Arguments

v

a vector

p

the probability points at which to compute the quantiles. Should be in the range (0,1).

window

the window size. if given as finite integer or double, passed through. If NULL, NA_integer_, NA_real_ or Inf are given, equivalent to an infinite window size. If negative, an error will be thrown.

wts

an optional vector of weights. Weights are ‘replication’ weights, meaning a value of 2 is shorthand for having two observations with the corresponding v value. If NULL, corresponds to equal unit weights, the default. Note that weights are typically only meaningfully defined up to a multiplicative constant, meaning the units of weights are immaterial, with the exception that methods which check for minimum df will, in the weighted case, check against the sum of weights. For this reason, weights less than 1 could cause NA to be returned unexpectedly due to the minimum condition. When weights are NA, the same rules for checking v are applied. That is, the observation will not contribute to the moment if the weight is NA when na_rm is true. When there is no checking, an NA value will cause the output to be NA.

max_order

the maximum order of the centered moment to be computed.

na_rm

whether to remove NA, false by default.

min_df

the minimum df to return a value, otherwise NaN is returned. This can be used to prevent moments from being computed on too few observations. Defaults to zero, meaning no restriction.

used_df

the number of degrees of freedom consumed, used in the denominator of the centered moments computation. These are subtracted from the number of observations.

restart_period

the recompute period. because subtraction of elements can cause loss of precision, the computation of moments is restarted periodically based on this parameter. Larger values mean fewer restarts and faster, though less accurate results.

check_wts

a boolean for whether the code shall check for negative weights, and throw an error when they are found. Default false for speed.

normalize_wts

a boolean for whether the weights should be renormalized to have a mean value of 1. This mean is computed over elements which contribute to the moments, so if na_rm is set, that means non-NA elements of wts that correspond to non-NA elements of the data vector.

Details

Computes the cumulants, then approximates quantiles using AS269 of Lee & Lin.

Value

A matrix, with one row for each element of x, and one column for each element of q.

Note

The current implementation is not as space-efficient as it could be, as it first computes the cumulants for each row, then performs the Cornish-Fisher approximation on a row-by-row basis. In the future, this computation may be moved earlier into the pipeline to be more space efficient. File an issue if the memory footprint is an issue for you.

The moment computations provided by fromo are numerically robust, but will often not provide the same results as the 'standard' implementations, due to differences in roundoff. We make every attempt to balance speed and robustness. User assumes all risk from using the fromo package.

Note that when weights are given, they are treated as replication weights. This can have subtle effects on computations which require minimum degrees of freedom, since the sum of weights will be compared to that minimum, not the number of data points. Weight values (much) less than 1 can cause computations to return NA somewhat unexpectedly due to this condition, while values greater than one might cause the computation to spuriously return a value with little precision.

Author(s)

Steven E. Pav shabbychef@gmail.com

References

Terriberry, T. "Computing Higher-Order Moments Online." http://people.xiph.org/~tterribe/notes/homs.html

J. Bennett, et. al., "Numerically Stable, Single-Pass, Parallel Statistics Algorithms," Proceedings of IEEE International Conference on Cluster Computing, 2009. https://www.semanticscholar.org/paper/Numerically-stable-single-pass-parallel-statistics-Bennett-Grout/a83ed72a5ba86622d5eb6395299b46d51c901265

Cook, J. D. "Accurately computing running variance." http://www.johndcook.com/standard_deviation.html

Cook, J. D. "Comparing three methods of computing standard deviation." http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation

See Also

t_running_apx_quantiles, running_cumulants, PDQutils::qapx_cf, PDQutils::AS269.

Examples

1
2
3
x <- rnorm(1e5)
xq <- running_apx_quantiles(x,c(0.1,0.25,0.5,0.75,0.9))
xm <- running_apx_median(x)

fromo documentation built on May 2, 2019, 5:07 a.m.