run_reg | R Documentation |
Perform regressions on the streaming time series of response and predictor data, and calculate the regression coefficients, the residuals, and the forecasts, using online recursive formulas.
run_reg(respv, predm, lambda, controlv)
respv |
A single-column time series or a single-column matrix of response data. |
predm |
A time series or a matrix of predictor data. |
lambda |
A decay factor which multiplies past estimates. |
controlv |
A list of model parameters (see Details). |
The function run_reg()
performs regressions on the streaming time
series of response r_t
and predictor p_t
data:
r_t = \beta_t p_t + \epsilon_t
Where \beta_t
are the trailing regression coefficients and
\epsilon_t
are the residuals.
It recursively updates the covariance matrix {cov}_t
between the
response and the predictor data, and the covariance matrix
{cov}_{pt}
between the predictors, using the decay factor
\lambda
:
{cov}_t = \lambda {cov}_{t-1} + (1-\lambda) r^T_t p_t
{cov}_{p t} = \lambda {cov}_{p (t-1)} + (1-\lambda) p^T_t p_t
It calculates the regression coefficients \beta_t
as equal to the
covariance matrix between the response and the predictor data
{cov}_t
, divided by the covariance matrix between the predictors
{cov}_{pt}
:
\beta_t = {cov}_t \, {cov}^{-1}_{p t}
It calculates the residuals \epsilon_t
as the difference between the
response r_t
minus the fitted values \beta_t p_t
:
\epsilon_t = r_t - \beta_t p_t
And the residual variance \sigma^2_t
as:
\bar{\epsilon}_t = \lambda \bar{\epsilon}_{t-1} + (1-\lambda) \epsilon_t
\sigma^2_t = \lambda \sigma^2_{t-1} + (1-\lambda) (\epsilon_t - \bar{\epsilon}_t)^2
It then calculates the regression forecasts f_t
, as equal to the
past regression coefficients \beta_{t-1}
times the current predictor
data p_t
:
f_t = \beta_{t-1} p_t
It finally calculates the forecast errors as the difference between the
response minus the regression forecasts: r_t - f_t
.
The coefficient matrix \beta
and the residuals \epsilon
have
the same number of rows as the predictor argument predm
.
The function run_reg()
accepts a list of regression model
parameters through the argument controlv
.
The argument controlv
contains the parameters regmod
and
residscale
.
Below is a description of how these parameters work.
The list of model parameters can be created using the function
param_reg()
.
The number of regression coefficients is equal to the number of columns of
the predictor matrix n
.
If the predictor matrix contains a unit intercept column then the first
regression coefficient is equal to the alpha value \alpha
.
If regmod = "least_squares"
(the default) then it performs the
standard least squares regression. This is currently the only option.
The residuals and the the forecast errors may be scaled by
their volatilities to obtain the z-scores.
The default is residscale = "none"
- no scaling.
If the argument residscale = "scale"
then the residuals
\epsilon_t
are divided by their volatilities \sigma_t
without subtracting their means:
\epsilon_t = \frac{\epsilon_t}{\sigma_t}
If the argument residscale = "standardize"
then the residual means
\bar{\epsilon}
are subtracted from the residuals, and then
they are divided by their volatilities \sigma_t
:
\epsilon_t = \frac{\epsilon_t - \bar{\epsilon}}{\sigma_t}
Which are equal to the z-scores.
The forecast errors are also scaled in the same way as the
residuals, according to the argumentresidscale
.
The above online recursive formulas are convenient for processing live streaming data because they don't require maintaining a buffer of past data. The above recursive formulas are equivalent to a convolution with exponentially decaying weights, but they're much faster to calculate. Using exponentially decaying weights is more natural than using a sliding look-back interval, because it gradually "forgets" about the past data.
The value of the decay factor \lambda
must be in the range between
0
and 1
.
If \lambda
is close to 1
then the decay is weak and past
values have a greater weight, so the trailing values have a greater
dependence on past data. This is equivalent to a long look-back
interval.
If \lambda
is much less than 1
then the decay is strong and
past values have a smaller weight, so the trailing values have a weaker
dependence on past data. This is equivalent to a short look-back
interval.
The function run_reg()
returns multiple columns of data, with the
same number of rows as the predictor matrix predm
. If the predictor
matrix predm
has n
columns then run_reg()
returns a
matrix with n+2
columns.
The first n
columns contain the regression coefficients (with the
first column equal to the alpha value \alpha
).
The last 2
columns are the regression residuals and the forecast
errors.
A matrix with the regression coefficients, the residuals, and
the forecasts (in that order - see details), with the same number of rows
as the predictor argument predm
.
## Not run:
# Calculate historical returns
retp <- na.omit(rutils::etfenv$returns[, c("XLF", "VTI", "IEF")])
# Response equals XLF returns
respv <- retp[, 1]
# Predictor matrix equals VTI and IEF returns
predm <- retp[, -1]
# Add unit intercept column to the predictor matrix
predm <- cbind(rep(1, NROW(predm)), predm)
# Calculate the trailing regressions
lambdaf <- 0.9 # Decay factor
# Create a list of regression parameters
controlv <- HighFreq::param_reg(residscale="standardize")
regs <- HighFreq::run_reg(respv=respv, predm=predm, lambda=lambda, controlv=controlv)
# Plot the trailing residuals
datav <- cbind(cumsum(respv), regs[, NCOL(regs)])
colnames(datav) <- c("XLF", "residuals")
colnamev <- colnames(datav)
dygraphs::dygraph(datav["2008/2009"], main="Residuals of XLF Versus VTI and IEF") %>%
dyAxis("y", label=colnamev[1], independentTicks=TRUE) %>%
dyAxis("y2", label=colnamev[2], independentTicks=TRUE) %>%
dySeries(axis="y", strokeWidth=2, col="blue") %>%
dySeries(axis="y2", strokeWidth=2, col="red") %>%
dyLegend(show="always", width=300)
# Calculate the trailing regressions using R code
lambda1 <- (1-lambdaf)
respv <- zoo::coredata(respv)
predm <- zoo::coredata(predm)
nrows <- NROW(predm)
ncols <- NCOL(predm)
covrespred <- respv[1, ]*predm[1, ]
covpred <- outer(predm[1, ], predm[1, ])
betas <- matrix(numeric(nrows*ncols), nc=ncols)
betas[1, ] <- covrespred %*% MASS::ginv(covpred)
resids <- numeric(nrows)
residm <- 0
residv <- 0
for (it in 2:nrows) {
covrespred <- lambdaf*covrespred + lambda1*respv[it, ]*predm[it, ]
covpred <- lambdaf*covpred + lambda1*outer(predm[it, ], predm[it, ])
betas[it, ] <- covrespred %*% MASS::ginv(covpred)
resids[it] <- respv[it, ] - (betas[it, ] %*% predm[it, ])
residm <- lambdaf*residm + lambda1*resids[it]
residv <- lambdaf*residv + lambda1*(resids[it] - residm)^2
resids[it] <- (resids[it] - residm)/sqrt(residv)
} # end for
# Compare values, excluding warmup period
all.equal(regs[-(1:1e3), ], cbind(betas, resids)[-(1:1e3), ], check.attributes=FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.