run_scale | R Documentation |
RcppArmadillo
.Standardize (center and scale) the columns of a time series of data
over time and in place, without copying the data in memory, using
RcppArmadillo
.
run_scale(tseries, lambda, center = TRUE, scale = TRUE)
tseries |
A time series or matrix of data. |
lambda |
A decay factor which multiplies past estimates. |
center |
A Boolean argument: if |
scale |
A Boolean argument: if |
The function run_scale()
performs a trailing standardization
(centering and scaling) of the columns of the tseries
argument
using RcppArmadillo
.
The function run_scale()
accepts a pointer to the argument
tseries
, and it overwrites the old data with the standardized
data. It performs the calculation in place, without copying the data in
memory, which can significantly increase the computation speed for large
time series.
The function run_scale()
performs a loop over the rows of
tseries
, and standardizes the data using its trailing means and
standard deviations.
The function run_scale()
calculates the trailing mean and variance
of streaming time series data r_t
, by recursively weighting
the past estimates with the new data, using the decay factor \lambda
:
\bar{r}_t = \lambda \bar{r}_{t-1} + (1-\lambda) r_t
\sigma^2_t = \lambda \sigma^2_{t-1} + (1-\lambda) (r_t - \bar{r}_t)^2
Where \bar{r}_t
is the trailing mean and \sigma^2_t
is the
trailing variance.
It then calculates the standardized data as follows:
r^{\prime}_t = \frac{r_t - \bar{r}_t}{\sigma_t}
If the arguments center
and scale
are both TRUE
(the
defaults), then calc_scale()
standardizes the data.
If the argument center
is FALSE
then calc_scale()
only scales the data (divides it by the standard deviations).
If the argument scale
is FALSE
then calc_scale()
only demeans the data (subtracts the means).
The value of the decay factor \lambda
must be in the range between
0
and 1
.
If \lambda
is close to 1
then the decay is weak and past
values have a greater weight, and the trailing variance values have a
stronger dependence on past data. This is equivalent to a long
look-back interval.
If \lambda
is much less than 1
then the decay is strong and
past values have a smaller weight, and the trailing variance values have a
weaker dependence on past data. This is equivalent to a short look-back
interval.
The above online recursive formulas are convenient for processing live streaming data because they don't require maintaining a buffer of past data. The formulas are equivalent to a convolution with exponentially decaying weights, but they're much faster to calculate. Using exponentially decaying weights is more natural than using a sliding look-back interval, because it gradually "forgets" about the past data.
The function run_scale()
uses RcppArmadillo
C++
code,
so it can be over 100
times faster than the equivalent R
code.
Void (no return value - modifies the data in place).
## Not run:
# Calculate historical returns
retp <- na.omit(rutils::etfenv$returns[, c("XLF", "VTI")])
# Calculate the trailing standardized returns using R code
lambdaf <- 0.9 # Decay factor
lambda1 <- 1 - lambdaf
scaled <- zoo::coredata(retp)
meanm <- scaled[1, ];
vars <- scaled[1, ]^2;
for (it in 2:NROW(retp)) {
meanm <- lambdaf*meanm + lambda1*scaled[it, ];
vars <- lambdaf*vars + lambda1*(scaled[it, ] - meanm)^2;
scaled[it, ] <- (scaled[it, ] - meanm)/sqrt(vars)
} # end for
# Calculate the trailing standardized returns using C++ code
HighFreq::run_scale(retp, lambda=lambdaf)
all.equal(zoo::coredata(retp), scaled, check.attributes=FALSE)
# Compare the speed of RcppArmadillo with R code
library(microbenchmark)
summary(microbenchmark(
Rcpp=HighFreq::run_scale(retp, lambda=lambdaf),
Rcode={for (it in 2:NROW(retp)) {
meanm <- lambdaf*meanm + lambda1*scaled[it, ];
vars <- lambdaf*vars + lambda1*(scaled[it, ] - meanm)^2;
scaled[it, ] <- (scaled[it, ] - meanm)/sqrt(vars)
}}, # end for
times=10))[, c(1, 4, 5)] # end microbenchmark summary
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.