run_scale: Standardize (center and scale) the columns of a _time series_...
In algoquant/HighFreq: High Frequency Time Series Management

run_scale

R Documentation

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using `RcppArmadillo`.

Description

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using RcppArmadillo.

Usage

run_scale(timeser, lambdaf, center = TRUE, scalit = TRUE)

Arguments

`timeser`	A time series or matrix of data.
`lambdaf`	A decay factor which multiplies past estimates.
`center`	A Boolean argument: if `TRUE` then center the columns so that they have zero mean or median (the default is `TRUE`).
`scalit`	A Boolean argument: if `TRUE` then scale the columns so that they have unit standard deviation or MAD (the default is `TRUE`).

Details

The function run_scale() performs a trailing standardization (centering and scaling) of the columns of the timeser argument using RcppArmadillo.

The function run_scale() accepts a pointer to the argument timeser, and it overwrites the old data with the standardized data. It performs the calculation in place, without copying the data in memory, which can significantly increase the computation speed for large time series.

The function run_scale() performs a loop over the rows of timeser, and standardizes the data using its trailing means and standard deviations.

The function run_scale() calculates the trailing mean and variance of streaming time series data r_t, by recursively weighting the past estimates with the new data, using the decay factor \lambda:

\bar{r}_t = \lambda \bar{r}_{t-1} + (1 - \lambda) r_t

\sigma^2_t = \lambda^2 \sigma^2_{t-1} + (1 - \lambda^2) (r_t - \bar{r}_t)^2

Where \bar{r}_t is the trailing mean and \sigma^2_t is the trailing variance.

It then calculates the standardized data as follows:

r^{\prime}_t = \frac{r_t - \bar{r}_t}{\sigma_t}

If the arguments center and scalit are both TRUE (the defaults), then calc_scale() standardizes the data. If the argument center is FALSE then calc_scale() only scales the data (divides it by the standard deviations). If the argument scalit is FALSE then calc_scale() only demeans the data (subtracts the means).

The value of the decay factor \lambda must be in the range between 0 and 1. If \lambda is close to 1 then the decay is weak and past values have a greater weight, and the trailing variance values have a stronger dependence on past data. This is equivalent to a long look-back interval. If \lambda is much less than 1 then the decay is strong and past values have a smaller weight, and the trailing variance values have a weaker dependence on past data. This is equivalent to a short look-back interval.

The above online recursive formulas are convenient for processing live streaming data because they don't require maintaining a buffer of past data. The formulas are equivalent to a convolution with exponentially decaying weights, but they're much faster to calculate. Using exponentially decaying weights is more natural than using a sliding look-back interval, because it gradually "forgets" about the past data.

The function run_scale() uses RcppArmadillo C++ code, so it can be over 100 times faster than the equivalent R code.

Value

Void (no return value - modifies the data in place).

Examples

## Not run: 
# Calculate historical returns
retp <- na.omit(rutils::etfenv$returns[, c("XLF", "VTI")])
# Calculate the trailing standardized returns using R code
lambdaf <- 0.9 # Decay factor
lambda1 <- 1 - lambdaf
scaled <- zoo::coredata(retp)
meanm <- scaled[1, ];
vars <- scaled[1, ]^2;
for (it in 2:NROW(retp)) {
  meanm <- lambdaf*meanm + lambda1*scaled[it, ];
  vars <- lambdaf*vars + lambda1*(scaled[it, ] - meanm)^2;
  scaled[it, ] <- (scaled[it, ] - meanm)/sqrt(vars)
}  # end for
# Calculate the trailing standardized returns using C++ code
HighFreq::run_scale(retp, lambdaf=lambdaf)
all.equal(zoo::coredata(retp), scaled, check.attributes=FALSE)
# Compare the speed of RcppArmadillo with R code
library(microbenchmark)
summary(microbenchmark(
  Rcpp=HighFreq::run_scale(retp, lambdaf=lambdaf),
  Rcode={for (it in 2:NROW(retp)) {
   meanm <- lambdaf*meanm + lambda1*scaled[it, ];
   vars <- lambdaf*vars + lambda1*(scaled[it, ] - meanm)^2;
   scaled[it, ] <- (scaled[it, ] - meanm)/sqrt(vars)
  }},  # end for
  times=10))[, c(1, 4, 5)]  # end microbenchmark summary

## End(Not run)  # end dontrun

algoquant/HighFreq documentation built on June 10, 2025, 3:54 p.m.

algoquant/HighFreq index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

algoquant/HighFreq
High Frequency Time Series Management

run_scale: Standardize (center and scale) the columns of a _time series_...
In algoquant/HighFreq: High Frequency Time Series Management

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using `RcppArmadillo`.

Description

Usage

Arguments

Details

Value

Examples

Related to run_scale in algoquant/HighFreq...

R Package Documentation

Browse R Packages

We want your feedback!

algoquant/HighFreq High Frequency Time Series Management

run_scale: Standardize (center and scale) the columns of a _time series_... In algoquant/HighFreq: High Frequency Time Series Management

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using RcppArmadillo.

Description

Usage

Arguments

Details

Value

Examples

Related to run_scale in algoquant/HighFreq...

R Package Documentation

Browse R Packages

We want your feedback!

algoquant/HighFreq
High Frequency Time Series Management

run_scale: Standardize (center and scale) the columns of a _time series_...
In algoquant/HighFreq: High Frequency Time Series Management

Standardize (center and scale) the columns of a time series of data over time and in place, without copying the data in memory, using `RcppArmadillo`.