knitr::opts_chunk$set(cache = TRUE, fig.width = 7)
The package as it presently stands contains functions that are being used
within Fall 2016 courses at the University of Illinois at Urbana-Champaign (UIUC).
Specifically, students are using these functions within
STAT 578 (Special Topic): Time Series Forecasting and
STAT 429: Time Series Analysis. Presently, the functions are meant to provide
students with the ability to interact with time series data. All functions
will eventually end up either moving to a different package (perhaps gmwm
)
or may end up being removed.
Outside of being used as an educational and practical tool, the package also
serves to provide tenets of pipable data within a time series context. Thus, the
package is primarily designed with the notion of the piping operator (%>%
)
fully in play as well as other R related tools like dplyr
and tidyr
.
As this package is experimental, the implementations may shift significantly.
Presently, the package can only be obtained through GitHub. Plans for a CRAN-based release are tentative. At some point, this may be added to SMAC's package mirror for the install.
For users who are interested in having the latest and greatest developments within time series, this option is ideal. Though, there is considerably more work that a user must do to have a stable version of the package. The setup to obtain the development version is platform dependent.
Specifically, one must have a compiler installed on your system that is compatible with R.
For help on obtaining a compiler consult:
Generally speaking, Linux users should have a compiler that is compatible with R already installed on their system.
With the system dependency taken care of, we continue on by installing the R specific package dependencies and finally the package itself by doing the following in an R session:
# Install dependencies # install.packages("devtools") # Install the package from GitHub without Vignettes/User Guides devtools::install_github("SMAC-Group/exts") # Install the package from GitHub with Vignettes/User Guides # Note: This will be a longer install as the vignettes must be built. devtools::install_github("SMAC-Group/exts", build_vignettes = TRUE)
To use the package simply load it into R via:
library("exts")
Presently, gmwm
and magrittr
will autoload allowing you to access
ts.model
, gts
, and the pipe (%>%
) operator.
One of the key features of time series resolves around understanding the dependency that exists between observations using autocorrelation. There exists four built in functions that enable the computation of the (Robust) Autocorrelation Function [(R)ACF] and Partial Autocorrelation Function (PACF). These functions come in two flavors:
emp_
theo_
The prior function handles .
The later is restricted to only analyzing autoregressive moving average processes
through the ARMA()
object found in gmwm
.
To tell what kind of function was graphed, the graph heading will contain either "Empirical" or "Theoretical" along side the type of computation (e.g. (R)ACF or PACF).
emp_
) (R)ACF and PACFEmpirical graphs for (R)ACF and PACF are great for analyzing sample time series data to determine the dependency structure.
# Set seed for reproducibility set.seed(8672) # Generate an AR1 xt = gen_gts(1000, AR1(phi = 0.6, sigma2 = 1)) # Obtain the empirical ACF vals = emp_acf(xt) autoplot(vals) # Obtain the empirical RACF (Robust ACF) vals = emp_acf(xt, robust = TRUE) autoplot(vals) # Obtain the covariance vals = emp_acf(xt, type = "covariance") autoplot(vals) # Obtain the empirical PACF vals = emp_pacf(xt) autoplot(vals) # Obtain both the empirical ACF & PACF vals = emp_corr(xt) autoplot(vals)
theo_
) ACF and PACF graphs for ARMA ModelsTheoretical graphs for ACF and PACF are also available separately via
theo_acf()
and theo_pacf()
or combined into one via theo_corr()
.
model = ARMA(ar = c(.50, -0.25), ma = .20) # Theoretical ACF vals = theo_acf(model) autoplot(vals) # Theoretical PACF vals = theo_pacf(model) autoplot(vals) # Create a theoretical graph vals = theo_corr(model) autoplot(vals)
eda_()
)Exploratory Data Analysis allows for visually and quantitatively obtaining graphing information.
Often times, it is helpful to observe the time series, ACF, and PACF graphs together. To do so, use:
# Bring in time series data xt = gts(log(lynx), start = 1821, freq = 1) # Obtain Empirical ACF & PACF alongside data vals = eda_ts(xt) # Graph Data, Empirical (R)ACF, and Empirical PACF autoplot(vals)
To observe difference between the classical and robust ACF, one can use compare_acf()
# Bring in time series data xt = gts(log(lynx), start = 1821, freq = 1) # Obtain the ACF values vals = compare_acf(xt) # Graph data autoplot(vals)
Often times, it is helpful to transform the data to attempt to make the time series appear stationary.
# Default transforms vals = eda_change(gnp) autoplot(vals) # Default transforms but applied together in the second case vals = eda_change(gnp, both = TRUE) autoplot(vals) # Different Transforms and applied separately in the second case vals = eda_change(gnp, t1 = sqrt, t2 = log10) autoplot(vals)
The style for diagnostic features has all functions being prefixed by diag_
.
Each function has its own graphing utility that can be called with either
autoplot
or plot
as before.
diag_resid
)Residuals are able to be extract from arima
and lm
classes in addition to
manipulated to be standardized via resid / sd(resid)
through a std = TRUE
flag.
# Extract and standardize residuals model = arima(sunspot.year, c(9,0,0)) resids = diag_resid(model, std = TRUE) # Graph Histogram autoplot(resids, type = "hist") # Graph Histogram autoplot(resids, type = "resid") # Graph Histogram autoplot(resids, type = "both")
diag_qq
)The diag_qq
plot creates a quantil-quantile plot that contains a fitted
normal line. Typically, this plot is used to evaluate the residuals of a model.
# Observing residuals of an Arima process model = arima(sunspot.year, c(9,0,0)) vals = diag_qq(model, std = TRUE) autoplot(vals) # Observe time series vals = diag_qq(sunspot.year, std = TRUE) autoplot(vals)
diag_wv
)Wavelet Variance provides an estimation technique to display whether the observed process is a White Noise. If the orange line follows the dark blue line closely or remains within the blue confidence interval, then the process is likely to be a white noise.
# White Noise Process model = arima(sunspot.year, c(9,0,0)) vals = diag_wv(model) autoplot(vals)
The package provides a convient wrapper into R's Box.test()
function that
separates the different tests, 'Ljung-Box' and 'Box-Pierce', to assess the null
hypothesis of independence in a given time series. The function makes available
a data.frame
containing: lag
, p_value
, statistic
for each of the tests
run that can then be used to achieve a diagnostic plot like so:
# Ljung-Box test model = arima(sunspot.year, c(9,0,0)) vals = diag_ljungbox(model) autoplot(vals) # Box-Pierce Test vals = diag_boxpierce(model) autoplot(vals)
Warning: The inputs for this function will change.
One further useful plot is to compare the residuals to the fitted values.
xt = sunspot.year model = arima(xt, c(9,0,0)) # Note additional manipulation is required outside the function # to obtain the two parameters are required resids = resid(model) fit = xt - resids vals = diag_fitted(fit, resids) autoplot(vals)
Warning: This function will change in the future to adhere more to a chained argument
Each of the diagnostic plots are available standalone. However, there is benefits
to plotting all of the diagnostic information simultaneously. Hence, the diag_ts()
function that will calculate all of the above values and provide a diagnostic plot.
# Observing residuals of an Arima process xt = sunspot.year model = arima(xt, c(9,0,0)) # Note two parameters are required vals = diag_ts(model, xt) autoplot(vals)
There are many ways to go about selecting an ideal time series model. Within
the package there are select_arima()
, select_arma()
, select_ar()
,
and select_ma()
functions that build all of the time series models alongside
multiple Information Criterion (IC) such as:
where $L_{\max }$ is the log-likelihood, $k$ is the number of parameters, and $n$ is sample size.
To acess the best model according to one of the criteria, use: best_model(x, ic="aic")
select_ar()
and select_ma()
# Bring in time series data xt = gts(diff(log(gnp)), start = 1700, freq = 1) model_info = select_ar(xt, p.min = 1, p.max = 8) # Obtain a plot of the model selection criteria autoplot(model_info) # Obtain the best model (AIC is default) best_model(model_info) # Obtain the best model with BIC bmodel = best_model(model_info, ic = "bic") bmodel # See the ACF of the residuals for the model vals = emp_corr(bmodel) autoplot(vals)
select_arima()
and select_arma()
Warning: These procedures have been known to have issues with the initial
range of parameters to attempt. You may wish to change default *.min
and *.max
when applicable. At a later time, support for functional errors should resolve
this issue.
# Bring in time series data xt = gts(diff(log(gnp)), start = 1700, freq = 1) model_info = select_arma(xt, p.max = 4, q.max = 3) # Obtain a plot of the model selection criteria autoplot(model_info) # Obtain the best model (AIC is default) best_model(model_info) # Obtain the best model with BIC bmodel = best_model(model_info, ic = "bic") bmodel # See the ACF of the residuals for the model vals = emp_corr(bmodel) autoplot(vals)
The textbook "Time Series Analysis and Its Applications: With Examples in R" (ASTSA)
provides the astsa
package. This package is able to provide better forecasting
when it comes to model fitting temporarily. The code below is found on page
165 of the ASTSA text.
library("astsa") pred = sarima.for(prodn, 12, 2, 1, 1, 0, 1, 3, 12) # forecast
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.