run_var | R Documentation |
Calculate the trailing mean and variance of streaming time series of data using an online recursive formula.
run_var(tseries, lambda)
tseries |
A time series or a matrix of data. |
lambda |
A decay factor which multiplies past estimates. |
The function run_var()
calculates the trailing mean and variance
of streaming time series of data r_t
, by recursively
weighting the past variance estimates \sigma^2_{t-1}
, with the
squared differences of the data minus its trailing means (r_t -
\bar{r}_t)^2
, using the decay factor \lambda
:
\bar{r}_t = \lambda \bar{r}_{t-1} + (1-\lambda) r_t
\sigma^2_t = \lambda \sigma^2_{t-1} + (1-\lambda) (r_t - \bar{r}_t)^2
Where r_t
are the streaming data, \bar{r}_t
are the trailing
means, and \sigma^2_t
are the trailing variance estimates.
The above online recursive formulas are convenient for processing live streaming data because they don't require maintaining a buffer of past data. The formulas are equivalent to a convolution with exponentially decaying weights, but they're much faster to calculate. Using exponentially decaying weights is more natural than using a sliding look-back interval, because it gradually "forgets" about the past data.
The value of the decay factor \lambda
must be in the range between
0
and 1
.
If \lambda
is close to 1
then the decay is weak and past
values have a greater weight, and the trailing variance values have a
stronger dependence on past data. This is equivalent to a long
look-back interval.
If \lambda
is much less than 1
then the decay is strong and
past values have a smaller weight, and the trailing variance values have a
weaker dependence on past data. This is equivalent to a short look-back
interval.
The function run_var()
performs the same calculation as the
standard R
functionstats::filter(x=series,
filter=weightv, method="recursive")
, but it's several times faster.
The function run_var()
returns a matrix with two columns and
the same number of rows as the input argument tseries
.
The first column contains the trailing means and the second contains the
variance.
A matrix with two columns and the same number of rows as the
input argument tseries
. The first column contains the trailing
means and the second contains the variance.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.