calc_scale: Standardize (center and scale) the columns of a _time series_...

View source: R/RcppExports.R

calc_scaleR Documentation

Standardize (center and scale) the columns of a time series of data in place, without copying the data in memory, using RcppArmadillo.

Description

Standardize (center and scale) the columns of a time series of data in place, without copying the data in memory, using RcppArmadillo.

Usage

calc_scale(tseries, center = TRUE, scale = TRUE, use_median = FALSE)

Arguments

tseries

A time series or matrix of data.

center

A Boolean argument: if TRUE then center the columns so that they have zero mean or median (the default is TRUE).

scale

A Boolean argument: if TRUE then scale the columns so that they have unit standard deviation or MAD (the default is TRUE).

use_median

A Boolean argument: if TRUE then the centrality (central tendency) is calculated as the median and the dispersion is calculated as the median absolute deviation (MAD) (the default is FALSE). If use_median = FALSE then the centrality is calculated as the mean and the dispersion is calculated as the standard deviation.

Details

The function calc_scale() standardizes (centers and scales) the columns of a time series of data in place, without copying the data in memory, using RcppArmadillo.

If the arguments center and scale are both TRUE and use_median is FALSE (the defaults), then calc_scale() performs the same calculation as the standard R function scale(), and it calculates the centrality (central tendency) as the mean and the dispersion as the standard deviation.

If the arguments center and scale are both TRUE (the defaults), then calc_scale() standardizes the data. If the argument center is FALSE then calc_scale() only scales the data (divides it by the standard deviations). If the argument scale is FALSE then calc_scale() only demeans the data (subtracts the means).

If the argument use_median is TRUE, then it calculates the centrality as the median and the dispersion as the median absolute deviation (MAD).

If the number of rows of tseries is less than 3 then it does nothing and tseries is not scaled.

The function calc_scale() accepts a pointer to the argument tseries, and it overwrites the old data with the standardized data. It performs the calculation in place, without copying the data in memory, which can significantly increase the computation speed for large time series.

The function calc_scale() uses RcppArmadillo C++ code, so on a typical time series it can be over 10 times faster than the function scale().

Value

Void (no return value - modifies the data in place).

Examples

## Not run: 
# Calculate a time series of returns
retp <- zoo::coredata(na.omit(rutils::etfenv$returns[, c("IEF", "VTI")]))
# Demean the returns
demeaned <- apply(retp, 2, function(x) (x-mean(x)))
HighFreq::calc_scale(retp, scale=FALSE)
all.equal(demeaned, retp, check.attributes=FALSE)
# Calculate a time series of returns
retp <- zoo::coredata(na.omit(rutils::etfenv$returns[, c("IEF", "VTI")]))
# Standardize the returns
retss <- scale(retp)
HighFreq::calc_scale(retp)
all.equal(retss, retp, check.attributes=FALSE)
# Compare the speed of Rcpp with R code
library(microbenchmark)
summary(microbenchmark(
  Rcode=scale(retp),
  Rcpp=HighFreq::calc_scale(retp),
  times=100))[, c(1, 4, 5)]  # end microbenchmark summary

## End(Not run)


algoquant/HighFreq documentation built on Oct. 26, 2024, 9:20 p.m.