msmooth: Data-driven Nonparametric Regression for the Trend in...

msmoothR Documentation

Data-driven Nonparametric Regression for the Trend in Equidistant Time Series

Description

This function runs an iterative plug-in algorithm to find the optimal bandwidth for the estimation of the nonparametric trend in equidistant time series (with short memory errors) and then employs the resulting bandwidth via either local polynomial or kernel regression.

Usage

msmooth(
  y,
  p = c(1, 3),
  mu = c(0, 1, 2, 3),
  bStart = 0.15,
  alg = c("A", "B", "N", "NA", "NAM", "NM", "O", "OA", "OAM", "OM"),
  method = c("lpr", "kr")
)

Arguments

y

a numeric vector that contains the input time series ordered from past to present.

p

an integer 1 (local linear regression) or 3 (local cubic regression); represents the order of polynomial within the local polynomial regression (see also the 'Details' section); is set to 1 by default; is automatically set to 1 if method = "kr".

mu

an integer 0, ..., 3 that represents the smoothness parameter of the kernel weighting function and thus defines the kernel function that will be used within the local polynomial regression; is set to 1 by default.

Number Kernel
0 Uniform Kernel
1 Epanechnikov Kernel
2 Bisquare Kernel
3 Triweight Kernel
bStart

a numeric object that indicates the starting value of the bandwidth for the iterative process; should be > 0; is set to 0.15 by default.

alg

a control parameter (as character) that indicates the corresponding algorithm used (set to "A" by default for p = 1 and to "B" for p = 3).

Algorithm Description
"A" Nonparametric estimation of the variance factor with an enlarged bandwidth, optimal inflation rate
"B" Nonparametric estimation of the variance factor with an enlarged bandwidth, naive inflation rate
"O" Nonparametric estimation of the variance factor, optimal inflation rate
"N" Nonparametric estimation of the variance factor, naive inflation rate
"OAM" Estimation of the variance factor with ARMA(p,q)-models, optimal inflation rate
"NAM" Estimation of the variance factor with ARMA(p,q)-models, naive inflation rate
"OA" Estimation of the variance factor with AR(p)-models, optimal inflation rate
"NA" Estimation of the variance factor with AR(p)-models, naive inflation rate
"OM" Estimation of the variance factor with MA(q)-models, optimal inflation rate
"NM" Estimation of the variance factor with MA(q)-models, naive inflation rate

It is proposed to use alg = "A" in combination with p = 1. If the user finds that the chosen bandwidth by algorithm "A" is too small, alg = "B" with preferably p = 3 is suggested. For more information on the components of the different algorithms, please consult tsmooth.

method

the smoothing approach; "lpr" represents the local polynomial regression, whereas "kr" implements a kernel regression; is set to "lpr" by default.

Details

The trend is estimated based on the additive nonparametric regression model for an equidistant time series

y_t = m(x_t) + \epsilon_t,

where y_t is the observed time series, x_t is the rescaled time on the interval [0, 1], m(x_t) is a smooth and deterministic trend function and \epsilon_t are stationary errors with E(\epsilon_t) = 0 and short-range dependence (see also Beran and Feng, 2002). With this function m(x_t) can be estimated without a parametric model assumption for the error series. Thus, after estimating and removing the trend, any suitable parametric model, e.g. an ARMA(p,q) model, can be fitted to the residuals (see arima).

The iterative-plug-in (IPI) algorithm, which numerically minimizes the Asymptotic Mean Squared Error (AMISE), was proposed by Feng, Gries and Fritz (2020).

Define I[m^{(k)}] = \int_{c_b}^{d_b} [m^{(k)}(x)]^2 dx, \beta_{(\nu, k)} = \int_{-1}^{1} u^k K_{(\nu, k)}(u) du and R(K) = \int_{-1}^{1} K_{(\nu, k)}^{2}(u) du, where p is the order of the polynomial, k = p + 1 is the order of the asymptotically equivalent kernel, \nu is the order of the trend function's derivative, 0 \leq c_{b} < d_{b} \leq 1, c_f is the variance factor and K_{(\nu, k)}(u) the k-th order equivalent kernel obtained for the estimation of m^{(\nu)} in the interior. m^{(\nu)} is the \nu-th order derivative (\nu = 0, 1, 2, ...) of the nonparametric trend.

Furthermore, we define

C_{1} = \frac{I[m^{(k)}] \beta_{(\nu, k)}^2}{(k!)^2}

and

C_{2} = \frac{2 \pi c_{f} (d_b - c_b) R(K)}{nh^{2 \nu + 1}}

with h being the bandwidth and n being the number of observations. The AMISE is then

AMISE(h) = h^{2(k-\nu)}C_{1} + C_{2}.

The function calculates suitable estimates for c_f, the variance factor, and I[m^{(k)}] over different iterations. In each iteration, a bandwidth is obtained in accordance with the AMISE that once more serves as an input for the following iteration. The process repeats until either convergence or the 40th iteration is reached. For further details on the asymptotic theory or the algorithm, please consult Feng, Gries and Fritz (2020) or Feng et al. (2019).

To apply the function, only few arguments are needed: a data input y, an order of polynomial p, a kernel function defined by the smoothness parameter mu, a starting value for the relative bandwidth bStart and a final smoothing method method. In fact, aside from the input vector y, every argument has a default setting that can be adjusted for the individual case. It is recommended to initially use the default values for p, alg and bStart and adjust them in the rare case of the resulting optimal bandwidth being either too small or too large. Theoretically, the initial bandwidth does not affect the selected optimal bandwidth. However, in practice local minima of the AMISE might exist and influence the selected bandwidth. Therefore, the default setting is bStart = 0.15, which is a compromise between the starting values bStart = 0.1 for p = 1 and bStart = 0.2 for p = 3 that were proposed by Feng, Gries and Fritz (2020). In the rare case of a clearly unsuitable optimal bandwidth, a starting bandwidth that differs from the default value is a first possible approach to obtain a better result. Other argument adjustments can be tried as well. For more specific information on the input arguments consult the section Arguments.

When applying the function, an optimal bandwidth is obtained based on the IPI algorithm proposed by Feng, Gries and Fritz (2020). In a second step, the nonparametric trend of the series is calculated with respect to the chosen bandwidth and the selected regression method (lpf or kr). It is notable that p is automatically set to 1 for method = "kr". The output object is then a list that contains, among other components, the original time series, the estimated trend values and the series without the trend.

The default print method for this function delivers key numbers such as the iteration steps and the generated optimal bandwidth rounded to the fourth decimal. The exact numbers and results such as the estimated nonparametric trend series are saved within the output object and can be addressed via the $ sign.

NOTE:

With package version 1.1.0, this function implements C++ code by means of the Rcpp and RcppArmadillo packages for better performance.

Value

The function returns a list with different components:

AR.BIC

the Bayesian Information Criterion of the optimal AR(p) model when estimating the variance factor via autoregressive models (if calculated; calculated for alg = "OA" and alg = "NA").

ARMA.BIC

the Bayesian Information Criterion of the optimal ARMA(p,q) model when estimating the variance factor via autoregressive-moving-average models (if calculated; calculated for alg = "OAM" and alg = "NAM").

cb

the percentage of omitted observations on each side of the observation period; always equal to 0.05.

b0

the optimal bandwidth chosen by the IPI-algorithm.

bb

the boundary bandwidth method used within the IPI; always equal to 1.

bStart

the starting value of the (relative) bandwidth; input argument.

bvc

indicates whether an enlarged bandwidth was used for the variance factor estimation or not; depends on the chosen algorithm.

cf0

the estimated variance factor; in contrast to the definitions given in the Details section, this object actually contains an estimated value of 2\pi c_f, i.e. it corresponds to the estimated sum of autocovariances.

cf0.AR

the estimated variance factor obtained by estimation of autoregressive models (if calculated; alg = "OA" or "NA").

cf0.ARMA

the estimated variance factor obtained by estimation of autoregressive-moving-average models (if calculated; calculated for alg = "OAM" and alg = "NAM").

cf0.LW

the estimated variance factor obtained by Lag-Window Spectral Density Estimation following Bühlmann (1996) (if calculated; calculated for algorithms "A", "B", "O" and "N").

cf0.MA

the estimated variance factor obtained by estimation of moving-average models (if calculated; calculated for alg = "OM" and alg = "NM").

I2

the estimated value of I[m^{(k)}].

InfR

the setting for the inflation rate according to the chosen algorithm.

iterations

the bandwidths of the single iterations steps

L0.opt

the optimal bandwidth for the lag-window spectral density estimation (if calculated; calculated for algorithms "A", "B", "O" and "N").

MA.BIC

the Bayesian Information Criterion of the optimal MA(q) model when estimating the variance factor via moving-average models (if calculated; calculated for alg = "OM" and alg = "NM").

Mcf

the estimation method for the variance factor estimation; depends on the chosen algorithm.

mu

the smoothness parameter of the second order kernel; input argument.

n

the number of observations.

niterations

the total number of iterations until convergence.

orig

the original input series; input argument.

p.BIC

the order p of the optimal AR(p) or ARMA(p,q) model when estimating the variance factor via autoregressive or autoregressive-moving average models (if calculated; calculated for alg = "OA", alg = "NA", alg = "OAM" and alg = "NAM").

p

the order of polynomial used in the IPI-algorithm; also used for the final smoothing, if method = "lpr"; input argument.

q.BIC

the order q of the optimal MA(q) or ARMA(p,q) model when estimating the variance factor via moving-average or autoregressive-moving average models (if calculated; calculated for alg = "OM", alg = "NM", alg = "OAM" and alg = "NAM").

res

the estimated residual series.

v

the considered order of derivative of the trend; is always zero for this function.

ws

the weighting system matrix used within the local polynomial regression; this matrix is a condensed version of a complete weighting system matrix; in each row of ws, the weights for conducting the smoothing procedure at a specific observation time point can be found; the first [nb + 0.5] rows, where n corresponds to the number of observations, b is the bandwidth considered for smoothing and [.] denotes the integer part, contain the weights at the [nb + 0.5] left-hand boundary points; the weights in row [nb + 0.5] + 1 are representative for the estimation at all interior points and the remaining rows contain the weights for the right-hand boundary points; each row has exactly 2[nb + 0.5] + 1 elements, more specifically the weights for observations of the nearest 2[nb + 0.5] + 1 time points; moreover, the weights are normalized, i.e. the weights are obtained under consideration of the time points x_t = t/n, where t = 1, 2, ..., n.

ye

the nonparametric estimates of the trend.

Author(s)

  • Yuanhua Feng (Department of Economics, Paderborn University),
    Author of the Algorithms
    Website: https://wiwi.uni-paderborn.de/en/dep4/feng/

  • Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
    Package Creator and Maintainer

References

Beran, J. and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Annals of the Institute of Statistical Mathematics, 54(2), 291-311.

Bühlmann, P. (1996). Locally adaptive lag-window spectral estimation. Journal of Time Series Analysis, 17(3), 247-270.

Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.

Feng, Y., Gries, T., Letmathe, S. and Schulz, D. (2019). The smoots package in R for semiparametric modeling of trend stationary time series. Discussion Paper. Paderborn University. Unpublished.

Examples

### Example 1: US-GDP ###

# Logarithm of test data
# -> the logarithm of the data is assumed to follow the additive model
test_data <- gdpUS
y <- log(test_data$GDP)

# Applied msmooth function for the trend
results <- msmooth(y, p = 1, mu = 1, bStart = 0.1, alg = "A", method = "lpr")
res <- results$res
ye <- results$ye

# Plot of the results
t <- seq(from = 1947, to = 2019.25, by = 0.25)
matplot(t, cbind(y, ye), type = "ll", lty = c(3, 1), col = c(1, "red"),
 xlab = "Years", ylab = "Log-Quartlery US-GDP",
 main = "Log-Quarterly US-GDP vs. Trend, Q1 1947 - Q2 2019")
legend("bottomright", legend = c("Original series", "Estimated trend"),
 fill = c(1, "red"), cex = 0.7)
results

## Not run: 
### Example 2: German Stock Index ###

# The following procedure can be considered, if (log-)returns are assumed
# to follow a model from the general class of semiparametric GARCH-type
# models (including Semi-GARCH, Semi-Log-GARCH and Semi-APARCH models among
# others) with a slowly changing variance over time due to a deterministic,
# nonparametric scale function.

# Obtain the logarithm of the squared returns
returns <- diff(log(dax$Close))   # (log-)returns
rt <- returns - mean(returns)     # demeaned (log-)returns
yt <- log(rt^2)                   # logarithm of the squared returns

# Apply 'smoots' function to the log-data, because the logarithm of
# the squared returns follows an additive model with a nonparametric trend
# function, if the returns are assumed to follow a semiparametric GARCH-type
# model.

# In this case, the setting 'alg = "A"' is used in combination with p = 3, as
# the resulting estimates appear to be more suitable than for 'alg = "B"'.
est <- msmooth(yt, p = 3, alg = "A")
m_xt <- est$ye                    # estimated trend values

# Obtain the standardized returns 'eps' and the scale function 'scale.f'
res <- est$res                    # the detrended log-data
C <- -log(mean(exp(res)))         # an estimate of a constant value needed
                                  # for the retransformation
scale.f <- exp((m_xt - C) / 2)    # estimated values of the scale function in
                                  # the returns
eps <- rt / scale.f               # the estimated standardized returns

# -> 'eps' can now be analyzed by any suitable GARCH-type model.
#    The total volatilities are then the product of the conditional
#    volatilities obtained from 'eps' and the scale function 'scale.f'.

## End(Not run)

smoots documentation built on Sept. 11, 2023, 9:07 a.m.