# R/dsmooth.R In smoots: Nonparametric Estimation of the Trend and Its Derivatives in TS

#### Documented in dsmooth

#' Data-driven Local Polynomial for the Trend's Derivatives in Equidistant Time
#' Series
#'
#'  This function runs through an iterative process in order to find the
#'  optimal bandwidth for the nonparametric estimation of the first or second
#'  derivative of the trend in an equidistant time series (with short-memory
#'  errors) and subsequently employs the obtained bandwidth via local
#'  polynomial regression.
#'
#'@param y a numeric vector that contains the time series ordered from past to
#'present.
#'@param d an integer \code{1} or \code{2} that defines the order of
#'derivative; the default is \code{d = 1}.
#'@param mu an integer \code{0}, ..., \code{3} that represents the smoothness
#'parameter of the kernel weighting function and thus defines the kernel
#'function that will be used within the local polynomial regression; is set to
#'\code{1} by default.
#'
#'\tabular{cl}{
#'\strong{Number} \tab \strong{Kernel}\cr
#'\code{0} \tab Uniform Kernel\cr
#'\code{1} \tab Epanechnikov Kernel\cr
#'\code{2} \tab Bisquare Kernel\cr
#'\code{3} \tab Triweight Kernel
#'}
#'@param pp an integer \code{1} (local linear regression) or \code{3} (local
#'cubic regression) that indicates the order of polynomial upon which
#'\eqn{c_f}, i.e. the variance factor, will be calculated by
#'\code{\link{msmooth}}; the default is \code{pp = 1}.
#'@param bStart.p a numeric object that indicates the starting value of the
#'bandwidth for the iterative process for the calculation of \eqn{c_f}; should
#'be \eqn{> 0}; is set to \code{0.15} by default.
#'@param bStart a numeric object that indicates the starting value of the
#'bandwidth for the iterative process; should be \eqn{> 0}; is set to
#'\code{0.15} by default.
#'
#'@details
#'The trend's derivative is estimated based on the additive
#'nonparametric regression model for an equidistant time series
#'\deqn{y_t = m(x_t) + \epsilon_t,}
#'where \eqn{y_t} is the observed time series, \eqn{x_t} is the rescaled time
#'on the interval \eqn{[0, 1]}, \eqn{m(x_t)} is a smooth and deterministic
#'trend function and \eqn{\epsilon_t} are stationary errors with
#'2002). With this function, the first or second derivative of \eqn{m(x_t)}
#'can be estimated without a parametric model assumption for the error series.
#'
#'The iterative-plug-in (IPI) algorithm, which numerically minimizes the
#'Asymptotic Mean Squared Error (AMISE), was proposed by Feng, Gries and
#'Fritz (2020).
#'
#'Define \eqn{I[m^{(k)}] = \int_{c_b}^{d_b} [m^{(k)}(x)]^2 dx}{I[m^(k)] =
#'int_[c_b]^[d_b] \{m^(k)\}^2 dx}, \eqn{\beta_{(\nu, k)} = \int_{-1}^{1} u^k
#'K_{(\nu, k)}(u) du}{\beta_(\nu, k) = int_[-1]^[1] u^k K_(\nu, k)(u) du}
#'and \eqn{R(K) = \int_{-1}^{1} K_{(\nu, k)}^{2}(u) du}{R(K) = int_[-1]^[1]
#'\{K_(\nu, k)\}^2 du}, where \eqn{p} is the order of the polynomial,
#'\eqn{k = p + 1} is the order of the asymptotically equivalent kernel,
#'\eqn{\nu} is the order of the trend function's derivative, \eqn{0 \leq c_{b}
#'< d_{b} \leq 1}{0 \le c_b < d_b \le 1}, \eqn{c_f} is the variance factor and
#'\eqn{K_{(\nu, k)}(u)}{K_(\nu, k)(u)} the \eqn{k}-th order equivalent kernel
#'obtained for the estimation of \eqn{m^{(\nu)}}{m^(\nu)} in the interior.
#'\eqn{m^{(\nu)}}{m^(\nu)} is the \eqn{\nu}-th order derivative (\eqn{\nu = 0,
#'1, 2, ...}) of the nonparametric trend.
#'
#'Furthermore, we define
#'\deqn{C_{1} = \frac{I[m^{(k)}] \beta_{(\nu, k)}^2}{(k!)^2}}{C_1 =
#'[I[m^(k)]\{\beta_(\nu, k)\}^2] / (k!)^2}
#'and
#'\deqn{C_{2} = \frac{2 \pi c_{f} (d_b - c_b) R(K)}{nh^{2 \nu + 1}}}{C_2 =
#'2\pi(d_b - c_b)R(K)c_f / (nh^[2\nu + 1])}
#'with \eqn{h} being the bandwidth and \eqn{n} being the number of
#'observations. The AMISE is then
#'\deqn{AMISE(h) = h^{2(k-\nu)}C_{1} + C_{2}.}{AMISE(h) = h^[2(k - \nu)] C_1 +
#'C_2.}
#'
#'The variance factor \eqn{c_f} is first obtained from a pilot-estimation of
#'the time series' nonparametric trend (\eqn{\nu = 0}) with polynomial order
#'\eqn{p_p}. The estimate is then plugged into the iterative procedure for
#'estimating the first or second derivative (\eqn{\nu = 1} or \eqn{\nu = 2}).
#'For further details on the asymptotic theory or the algorithm, we refer the
#'user to Feng, Fritz and Gries (2020) and Feng et al. (2019).
#'
#'The function itself is applicable in the following way: Based on a data input
#'\code{y}, an order of polynomial \code{pp} for the variance factor estimation
#'procedure, a starting value for the relative bandwidth \code{bStart.p} in the
#'variance factor estimation procedure, a kernel function defined by the
#'smoothness parameter \code{mu} and a starting value for the relative
#'bandwidth \code{bStart} in the bandwidth estimation procedure, an optimal
#'bandwidth is numerically calculated for the trend's derivative of order
#'\code{d}. In fact, aside from the input vector \code{y}, every argument has a
#'default setting that can be adjusted for the individual case. However, it is
#'recommended to initially use the default values for the estimation of the
#'first derivative and adjust the argument \code{d} to \code{d = 2} for the
#'estimation of the second derivative. Following Feng, Gries and Fritz (2020),
#'the initial bandwidth does not affect the resulting optimal bandwidth in
#'theory. However in practice, local minima of the AMISE can influence the
#'results. Therefore, the default starting bandwidth is set to \code{0.15}, the
#'suggested starting bandwidth by Feng, Gries and Fritz (2020) for the
#'data-driven  estimation of the first derivative. The recommended initial
#'bandwidth for the second derivative, however, is \code{0.2} and not
#'\code{0.15}. Thus, if the algorithm does not give suitable results
#'(especially for \code{d = 2}), the adjustment of the initial bandwidth might
#'be a good starting point. Analogously, the default starting bandwidth for the
#'trend estimation for the variance factor is \code{bStart.p = 0.15}, although
#'according to Feng, Gries and Fritz (2020), \code{bStart.p = 0.1} is suggested
#'for \code{pp = 1} and \code{bStart.p = 0.2} for \code{pp = 3}. The default is
#'therefore a compromise between the two suggested values. For more specific
#'information on the input arguments consult the section \emph{Arguments}.
#'
#'After the bandwidth estimation, the nonparametric derivative of the series
#'is calculated with respect to the obtained optimal bandwidth by means of a
#'local polynomial regression. The output object is then a list that contains,
#'among other components, the original time series, the estimates of the
#'derivative and the estimated optimal bandwidth.
#'
#'The default print method for this function delivers key numbers such as
#'the iteration steps and the generated optimal bandwidth rounded to the fourth
#'decimal. The exact numbers and results such as the estimated nonparametric
#'trend series are saved within the output object and can be addressed via the
#'\code{$} sign. #' #'NOTE: #' #'The estimates are obtained for the rescaled time points on the interval #'\eqn{[0, 1]}. Therefore, the estimated derivatives might not reflect the #'derivatives for the actual time points. To rescale them, we refer the #'user to the \code{\link{rescale}} function of the \code{smoots} package. #' #'With package version 1.1.0, this function implements C++ code by means #'of the \code{\link[Rcpp:Rcpp-package]{Rcpp}} and #'\code{\link[RcppArmadillo:RcppArmadillo-package]{RcppArmadillo}} packages for #'better performance. #' #'@return The function returns a list with different components: #' #'\describe{ #'\item{b0}{the optimal bandwidth chosen by the IPI-algorithm.} #'\item{bStart}{the starting bandwidth for the local polynomial #'regression based derivative estimation procedure; input argument.} #'\item{bStart.p}{the starting bandwidth for the nonparametric trend estimation #'that leads to the variance factor estimate; input argument.} #'\item{bvc}{indicates whether an enlarged bandwidth was used for the variance #'factor estimation or not; it is always set to \code{"Y"} (yes) for this #'function.} #'\item{cf0}{the estimated variance factor; in contrast to the definitions #'given in the \emph{Details} section, this object actually contains an #'estimated value of \eqn{2\pi c_f}, i.e. it corresponds to the estimated sum #'of autocovariances.} #'\item{InfR}{the inflation rate setting.} #'\item{iterations}{the bandwidths of the single iterations steps} #'\item{Mcf}{the estimation method for the variance factor estimation; it is #'always estimated nonparametrically (\code{"NP"}) within this function.} #'\item{mu}{the smoothness parameter of the second order kernel; input #'argument.} #'\item{n}{the number of observations.} #'\item{niterations}{the total number of iterations until convergence.} #'\item{orig}{the original input series; input argument.} #'\item{p}{the order of polynomial for the local polynomial #'regression used within derivative estimation procedure.} #'\item{pp}{the order of polynomial for the local polynomial #'regression used in the variance factor estimation; input argument.} #'\item{v}{the considered order of the trend's derivative; input argument #'\code{d}.} #'\item{ws}{the weighting system matrix used within the local polynomial #'regression; this matrix is a condensed version of a complete weighting system #'matrix; in each row of \code{ws}, the weights for conducting the smoothing #'procedure at a specific observation time point can be found; the first #'\eqn{[nb + 0.5]} rows, where \eqn{n} corresponds to the number of #'observations, \eqn{b} is the bandwidth considered for smoothing and #'\eqn{[.]} denotes the integer part, contain the weights at the #'\eqn{[nb + 0.5]} left-hand boundary points; the weights in row #'\eqn{[nb + 0.5] + 1} are representative for the estimation at all #'interior points and the remaining rows contain the weights for the right-hand #'boundary points; each row has exactly \eqn{2[nb + 0.5] + 1} elements, #'more specifically the weights for observations of the nearest #'\eqn{2[nb + 0.5] + 1} time points; moreover, the weights are normalized, #'i.e. the weights are obtained under consideration of the time points #'\eqn{x_t = t/n}, where \eqn{t = 1, 2, ..., n}.} #'\item{ye}{the nonparametric estimates of the derivative for the rescaled #'time points on the interval \eqn{[0, 1]}.} #'} #' #'@md #' #'@export #' #'@references #' Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven #' local polynomial for the trend and its derivatives in economic time #' series. Journal of Nonparametric Statistics, 32:2, 510-533. #' #' Feng, Y., Gries, T., Letmathe, S. and Schulz, D. (2019). The smoots package #' in R for semiparametric modeling of trend stationary time series. Discussion #' Paper. Paderborn University. Unpublished. #' #'@author #'\itemize{ #'\item Yuanhua Feng (Department of Economics, Paderborn University), \cr #'Author of the Algorithms \cr #'Website: \url{https://wiwi.uni-paderborn.de/en/dep4/feng/} #'\item Dominik Schulz (Research Assistant) (Department of Economics, Paderborn #'University), \cr #'Package Creator and Maintainer #'} #' #'@examples #'# Logarithm of test data #'test_data <- gdpUS #'y <- log(test_data$GDP)
#'t <- seq(from = 1947, to = 2019.25, by = 0.25)
#'
#'# Applied dsmooth function for the trend's first derivative
#'result_d <- dsmooth(y, d = 1, mu = 1, pp = 1, bStart.p = 0.1, bStart = 0.15)
#'estim <- result_d$ye #' #'# Plot of the results #'plot(t, estim, xlab = "Year", ylab = "First derivative", type = "l", #' main = paste0("Estimated first derivative of the trend for log-quarterly ", #' "US-GDP, Q1 1947 - Q2 2019"), cex.axis = 0.8, cex.main = 0.8, #' cex.lab = 0.8, bty = "n") #' #'# Print result #'result_d # The main function "dsmooth"------------------------------------------ # d = 1 or 2 indicate iterative plug-in for m' or m'', respectively dsmooth <- function(y, d = c(1, 2), mu = c(0, 1, 2, 3), pp = c(1, 3), bStart.p = 0.15, bStart = 0.15) { if (length(y) <= 1 || !all(!is.na(y)) || !is.numeric(y)) { stop("The argument 'y' must be a numeric vector with length > 1 and ", "without NAs.") } if (!(length(d) %in% c(1, 2)) || !all(!is.na(d)) || !is.numeric(d)) { stop("The argument 'd' must be a single integer value (1 or 2).") } d <- floor(d) if (!(length(mu) %in% c(1, 4)) || !all(!is.na(mu)) || !is.numeric(mu)) { stop("The argument 'mu' must be a single integer value (0, 1, 2 or 3).") } mu <- floor(mu) if (!(length(pp) %in% c(1, 2)) || !all(!is.na(pp)) || !is.numeric(pp)) { stop("The argument 'pp' must be a single integer value (either 1 or 3).") } pp <- floor(pp) if (length(bStart.p) != 1 || is.na(bStart.p) || !is.numeric(bStart.p) || (bStart <= 0)) { stop("The argument 'bStart.p' must be a single non-NA double value with ", "bStart.p > 0.") } if (length(bStart) != 1 || is.na(bStart) || !is.numeric(bStart) || (bStart <= 0)) { stop("The argument 'bStart' must be a single non-NA double value with ", "bStart > 0.") } if (all(d == c(1, 2))) d <- 1 if (all(mu == c(0, 1, 2, 3))) mu <- 1 if (all(pp == c(1, 3))) pp <- 1 if (!(d %in% c(1, 2))) { stop("Input of argument 'd' incorrect. It must be set to either 1 or 2.") } if (!(pp %in% c(1, 3))) { stop("Input of argument 'pp' incorrect. It must be set to either 1 or 3.") } if (pp == 1) {result.p <- msmoothCalc(y, 1, mu, bStart.p, "A", "lpr")} if (pp == 3) {result.p <- msmoothCalc(y, 3, mu, bStart.p, "B", "lpr")} cf0 <- result.p$cf0

# Using the knn idea, bb=1, or not
bb <- 1  # default
if (d == 1) {InfR <- "Nai"} else {InfR <- "Var"}  # default
cb <- 0.05
# In this code ccf and bStart will be provided by the user
# Input parameters
n <- length(y)
p <- d + 1
k <- p + 1
pd <- p + 2
runc <- 1
n1 <- trunc(n * cb)

# New method for the kernel constants with p = 1 or 3------------------------

# Kernel constants-----------------------------------------------------------
m <- 1000000  # for the numerical integral
u <- (-m:m) / (m + 0.5)
# For d = 1, the four 3rd-order kernels for m' (Table 5.7, Mueller, 1988)

wkp <- lookup$lookup_3kerns[[(mu + 1), d]](u) Rp <- sum(wkp ** 2) / m mukp <- sum((u ** k) * wkp) / m # Two constant in the bandwidth c1 <- factorial(k) ** 2 * (2 * d + 1) / (2 * (k - d)) # for d = 1 or 2 c2 <- (1 - 2 * cb) * Rp / (mukp) ** 2 steps <- rep(NA, 40) bd2_func <- lookup$InfR2_lookup[[d, InfR]]
expoDer1 <- lookup$exponDer1[[d]] expoDer2 <- lookup$exponDer2[[d]]

# The main iteration---------------------------------------------------------

noi <- 40  # maximal number of iterations

for (i in 1:noi) {
if (runc == 1) {
if (i > 1) {bold1 <- bold}
if (i == 1) {bold <- bStart} else {bold <- bopt}

# Look up EIM inflation rates from the lookup table
bd <- bd2_func(bold)

if (bd >= 0.49) {bd <- 0.49}
yed <- c(gsmoothCalcCpp(y, k, pd, mu, bd, bb))
I2 <- sum(yed[(n1 + 1):(n - n1)] ** 2) / (n - 2 * n1)
c3 <- cf0 / I2

bopt <- (c1 * c2 * c3)^expoDer1 * n^(-expoDer1)
if (bopt < n^expoDer2) bopt <- n^expoDer2

if (bopt > 0.49) {bopt <- 0.49}
steps[i] = bopt

if (i > 2 && abs(bold - bopt) / bopt < 1 / n) {runc <- 0}
if (i > 3 && abs(bold1 - bopt) / bopt < 1 / n){
bopt <- (bold + bopt) / 2
runc <- 0
}
}
}

# Smooth with the selected bandwidth-----------------------------------------

if (bopt < n^expoDer2) bopt <- n^expoDer2
if (bopt > 0.49) {bopt <- 0.49}
est.opt <- gsmoothCalc2Cpp(y, d, p, mu, bopt, bb)
ye <- c(est.opt$ye) ws <- est.opt$ws
# Final results
results <- list(ye = ye, orig = y, b0 = bopt, ws = ws,
cf0 = cf0, v = d, n = length(y),
iterations = steps[!is.na(steps)],
niterations = length(steps[!is.na(steps)]),
pp = pp, bStart.p = bStart.p, bStart = bStart,
InfR = InfR, mu = mu, p = d + 1, bvc = "Y", Mcf = "NP")

class(results) <- "smoots"
attr(results, "function") <- "dsmooth"

results
}
# End of the function


## Try the smoots package in your browser

Any scripts or data that you put into this service are public.

smoots documentation built on Oct. 10, 2021, 1:09 a.m.