Description Usage Arguments Details Value Author(s) References See Also Examples
Apply ARIM model fitting onto a table that contains time series data. The table must have two columns: one for the time series values, and the other for the time stamps. The time stamp can be anything that can be ordered. This is because the rows of a table does not have inherent order and thus needs to be ordered by the extra time stamp column.
1 2 3 4 5 6 7 8 9 10 | ## S4 method for signature 'db.Rquery,db.Rquery'
madlib.arima(x, ts, by = NULL,
order=c(1,1,1), seasonal = list(order = c(0,0,0), period = NA),
include.mean = TRUE, method = "CSS", optim.method = "LM",
optim.control = list(), ...)
## S4 method for signature 'formula,db.obj'
madlib.arima(x, ts, order=c(1,1,1),
seasonal = list(order = c(0,0,0), period = NA), include.mean = TRUE,
method = "CSS", optim.method = "LM", optim.control = list(), ...)
|
x |
A formula with the format of We must specify the time stamp because the table in database has no order of rows, and we have to order they according the given time stamps. |
ts |
If |
by |
A list of |
order |
A vector of 3 integers, default is |
seasonal |
A list of |
include.mean |
A logical value, default is |
method |
A string, the fitting method. The default is "CSS", which uses conditional-sum-of-squares to fit the time series. Right now, only "CSS" is supported. |
optim.method |
A string, the optimization method. The default is "LM", the Levenberg-Marquardt algorithm. Right now, only "LM" is supported. |
optim.control |
A list, default is - max_iter: Maximum number of iterations to run learning algorithm (Default = 100) - tau: Computes the initial step size for gradient algorithm (Default = 0.001) - e1: Algorithm-specific threshold for convergence (Default = 1e-15) - e2: Algorithm-specific threshold for convergence (Default = 1e-15) - e3: Algorithm-specific threshold for convergence (Default = 1e-15) - hessian_delta: Delta parameter to compute a numerical approximation of the Hessian matrix (Default = 1e-6) |
... |
Other optional parameters. Not implemented. |
Given a time series of data X, the Autoregressive Integrated Moving Average (ARIMA) model is a tool for understanding and, perhaps, predicting future values in the series. The model consists of three parts, an autoregressive (AR) part, a moving average (MA) part, and an integrated (I) part where an initial differencing step can be applied to remove any non-stationarity in the signal. The model is generally referred to as an ARIMA(p, d, q) model where parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively.
MADlib's ARIMA function implements a parallel version of the LM algorithm to maximize the conditional log-likelihood, which is suitable for big data.
Returns an arima.css.madlib
object, which is a list that
contains the following items:
coef |
A vector of double values. The fitting coefficients of AR, MA and
mean value (if |
s.e. |
A vector of double values. The standard errors of the fitting coefficients. |
series |
A string, the data source table or SQL query. |
time.stamp |
A string, the name of the time stamp column. |
time.series |
A string, the name of the time series column. |
sigma2 |
the MLE of the innovations variance. |
loglik |
the maximized conditional log-likelihood (of the differenced data). |
iter.num |
An integer, how many iterations of the LM algorithm is used to fit the time series with ARIMA model. |
exec.time |
The time spent on the MADlib ARIMA fitting. |
residuals |
A |
model |
A |
statistics |
A |
call |
A language object. The matched function call. |
Author: Predictive Analytics Team at Pivotal Inc.
Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io
[1] Rob J Hyndman and George Athanasopoulos: Forecasting: principles and practice, https://otexts.com/fpp/
[2] Robert H. Shumway, David S. Stoffer: Time Series Analysis and Its Applications With R Examples, Third edition Springer Texts in Statistics, 2010
[3] Henri Gavin: The Levenberg-Marquardt method for nonlinear least squares curve-fitting problems, 2011
madlib.lm
, madlib.glm
,
madlib.summary
are MADlib
wrapper functions.
delete
deletes the result of
this function together with the model, residual and statistics
tables.
print.arima.css.madlib
, show.arima.css.madlib
and
summary.arima.css.madlib
prints the result in a pretty
format.
predict.arima.css.madlib
makes forecast of the time series
based upon the result of this function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ## Not run:
library(PivotalR)
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)
## use double values as the time stamp
## Any values that can be ordered will work
example_time_series <- data.frame(id =
seq(0,1000,length.out=length(ts)),
val = arima.sim(list(order=c(2,0,1), ar=c(0.7,
-0.3), ma=0.2), n=1000000) + 3.2)
x <- as.db.data.frame(example_time_series, field.types = list(id="double
precision", val = "double precision"), conn.id = cid)
dim(x)
names(x)
## use formula
s <- madlib.arima(val ~ id, x, order = c(2,0,1))
s
## delete s and the 3 tables: model, residuals and statistics
delete(s)
s # s does not exist any more
## do not use formula
s <- madlib.arima(x$val, x$id, order = c(2,0,1))
s
lookat(sort(s$residuals, F, s$residuals$tstamp), 10)
lookat(s$model)
lookat(s$statistics)
## 10 forecasts
pred <- predict(s, n.ahead = 10)
lookat(sort(pred, F, pred$step_ahead), "all")
## Use expressions
s <- madlib.arima(val+2 ~ I(id + 1), x, order = c(2,0,1))
db.disconnect(cid, verbose = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.