run_scale: Standardize (center and scale) the columns of a _time series_...

View source: R/RcppExports.R

run_scaleR Documentation

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using RcppArmadillo.

Description

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using RcppArmadillo.

Usage

run_scale(tseries, lambda, center = TRUE, scale = TRUE)

Arguments

tseries

A time series or matrix of data.

lambda

A decay factor which multiplies past estimates.

center

A Boolean argument: if TRUE then center the columns so that they have zero mean or median (the default is TRUE).

scale

A Boolean argument: if TRUE then scale the columns so that they have unit standard deviation or MAD (the default is TRUE).

Details

The function run_scale() performs a trailing standardization (centering and scaling) of the columns of the tseries argument using RcppArmadillo.

The function run_scale() accepts a pointer to the argument tseries, and it overwrites the old data with the standardized data. It performs the calculation in place, without copying the data in memory, which can significantly increase the computation speed for large time series.

The function run_scale() performs a loop over the rows of tseries, and standardizes the data using its trailing means and standard deviations.

The function run_scale() calculates the trailing mean and variance of streaming time series data r_t, by recursively weighting the past estimates with the new data, using the decay factor \lambda:

\bar{r}_t = \lambda \bar{r}_{t-1} + (1-\lambda) r_t

\sigma^2_t = \lambda \sigma^2_{t-1} + (1-\lambda) (r_t - \bar{r}_t)^2

Where \bar{r}_t is the trailing mean and \sigma^2_t is the trailing variance.

It then calculates the standardized data as follows:

r^{\prime}_t = \frac{r_t - \bar{r}_t}{\sigma_t}

If the arguments center and scale are both TRUE (the defaults), then calc_scale() standardizes the data. If the argument center is FALSE then calc_scale() only scales the data (divides it by the standard deviations). If the argument scale is FALSE then calc_scale() only demeans the data (subtracts the means).

The value of the decay factor \lambda must be in the range between 0 and 1. If \lambda is close to 1 then the decay is weak and past values have a greater weight, and the trailing variance values have a stronger dependence on past data. This is equivalent to a long look-back interval. If \lambda is much less than 1 then the decay is strong and past values have a smaller weight, and the trailing variance values have a weaker dependence on past data. This is equivalent to a short look-back interval.

The above online recursive formulas are convenient for processing live streaming data because they don't require maintaining a buffer of past data. The formulas are equivalent to a convolution with exponentially decaying weights, but they're much faster to calculate. Using exponentially decaying weights is more natural than using a sliding look-back interval, because it gradually "forgets" about the past data.

The function run_scale() uses RcppArmadillo C++ code, so it can be over 100 times faster than the equivalent R code.

Value

Void (no return value - modifies the data in place).

Examples

## Not run: 
# Calculate historical returns
retp <- na.omit(rutils::etfenv$returns[, c("XLF", "VTI")])
# Calculate the trailing standardized returns using R code
lambdaf <- 0.9
lambda1 <- 1 - lambdaf
scaled <- zoo::coredata(retp)
meanm <- scaled[1, ];
vars <- scaled[1, ]^2;
for (it in 2:NROW(retp)) {
  meanm <- lambdaf*meanm + lambda1*scaled[it, ];
  vars <- lambdaf*vars + lambda1*(scaled[it, ] - meanm)^2;
  scaled[it, ] <- (scaled[it, ] - meanm)/sqrt(vars)
}  # end for
# Calculate the trailing standardized returns using C++ code
HighFreq::run_scale(retp, lambda=lambdaf)
all.equal(zoo::coredata(retp), scaled, check.attributes=FALSE)
# Compare the speed of RcppArmadillo with R code
library(microbenchmark)
summary(microbenchmark(
  Rcpp=HighFreq::run_scale(retp, lambda=lambdaf),
  Rcode={for (it in 2:NROW(retp)) {
   meanm <- lambdaf*meanm + lambda1*scaled[it, ];
   vars <- lambdaf*vars + lambda1*(scaled[it, ] - meanm)^2;
   scaled[it, ] <- (scaled[it, ] - meanm)/sqrt(vars)
  }},  # end for
  times=10))[, c(1, 4, 5)]  # end microbenchmark summary

## End(Not run)


algoquant/HighFreq documentation built on Feb. 9, 2024, 8:15 p.m.