Guide to Time series data and ARIMA(X) model visualization by plotly [TSplotly]"

TSplotly R Package Installation

N.B.: Please download the source files from our Github repository and install it (either in R or R-RStudio) for testing. You could also find the same source at the Github page of project TCIU of SOCR team.

# Installation from the Windows binary (recommended for Windows systems)
## Set the directory to the working space contains "TSplotly_1.1.0.tar.gz"
library(devtools)
install("TSplotly",build_vignettes = TRUE)
## Or you can use the command line to achieve this
## Set the directory to the working space contains "TSplotly_1.1.0.tar.gz"
system("R CMD INSTALL TSplotly")
# Installation from the source (recommended for Macs and Linux systems)
install.packages("~/TSplotly_1.1.0.tar.gz", repos = NULL, type = "source")

Once published on CRAN, the installation can be done by the following command:

install.packages("TSplotly") 

Background

This document is set up for package \code{TSplotly} which is developed in the R environment. This package provides a portable plot_ly style interactive display of longitudinal (timeseries) data. It is mainly based on packages \code{ggplot2} and \code{plotly}. Functions in this package mainly deal with time series data (data created by function \code{ts}) or results come from ARIMA(X) models (models generated by funcion \code{auto.arima}). Data that can be applied by this package is mainly preprocessed by package \code{forecast}.

Main functions of this plot are tested under the SOCR project Data Science: Time Complexity and Inferential Uncertainty (TCIU) by Ivo D. Dinov, Milen V. Velev, Yongkai Qiu, Zhe Yin. University of Michigan, Ann Arbor. Most of the examples of this package can be found in the last part of Chapter 5.

Introduction of functions with examples

The TSplotly package comprises 4 functions.

Function TSplot

This function mainly takes in fitted ARIMA(X) model created by function \code{auto.arima} under package \code{forecast}. After taking in the fitted result of ARIMA model. It will generate predicted future time series results as well as 80% and 95% confidence interval. Also, the original training time seris data will also be generated on the plot. Periods of original time series data can be controled by parameter. Also, if original model contains a matrix of external regressors (i.e. model is an ARIMAX model). Then this matrix must be included inside this function.

Below are parameters in this function:

This function will return a \code{plot_ly} style of plot. It can be saved as a variable and more elements can be put in using pipeline \code{%>%}. More details can be viewed in the plotly homepage of R.

Note that this function can only work on ARIMA(X) models with a time format containing year and month(e.g. "2017-02-14"). As it is using function \code{as.yearmon} from package \code{zoo}. So the time format must satisfy function \code{as.yearmon} as well. This function can be very helpful when dealing with time series data related to finance or log data with a standard time format. If an error occurs. You may wish to use \code{TSplot_gen} instead which can accept a more flexible time format.

Example of TSplot function

require(TSplotly)
require(zoo)
require(ggplot2)
require(plotly)
require(forecast)

# Creating time series data
MCSI_Data_monthAvg_ts_Y <- ts(Y, start=c(1978,1), end=c(2018, 12), frequency = 12)

# Applying ARIMAX model
modArima <- auto.arima(MCSI_Data_monthAvg_ts_Y, xreg=X)

# Creating plot_ly results
## 48 means that there will be 48 periods from the original
## time series dataset that is included in the plot result.
## You could also change this to "all" to see all original dataset in a single plot.
TSplot(48,modArima,X_new,title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series")

This example is based on TCIU Figure 1.6 of Chapter 1. 48 periods of training time series data has been chosen so that the time is 4 years from 2015 to 2019.

Function TSplot_gen

A more general version of TSplot. It can take in fitted ARIMA(X) model and plot both training time series data and predicted time series results along with its 80% and 95% confidence interval .The biggest advantage of this function is that it doesn't require that the time format in the model must be consistent with the format accepted by function \code{as.yearmon} (i.e. a time series data that has year and month information). Instead, it can take in any format of time. But if you wish to include labels for each time, a vector of time labels must be included. Another advantage of this function is that you can include a list of other time series data inside this function such that more time lines can be drawn simultaneously with the result of ARIMA(X) model. Note that if you wish to achieve this in \code{TSplot} function, function \code{ADDline} must also be used.

Below are parameters in this function:

Examples of TSplot_gen function

Example one

This example will generate the same result of the example of function \code{TSplotly}. Notice that you must put in a vector of labels to get the year and month labels similiar to the previous example.(Which means that when dealing time series dataset with a year and month time format, \code{TSplot} function may be a good choice)

#Create labels for training time series data and ARIMAX result (48 periods of training data included)
require(zoo)
#Time labels for training data
time_label1<-as.yearmon(time(MCSI_Data_monthAvg_ts_Y))[(length(MCSI_Data_monthAvg_ts_Y)-48+1):length(MCSI_Data_monthAvg_ts_Y)]
#Time labels for ARIMAX model(need to fit model first)
time_pred<-forecast(modArima,xreg = X_new)
time_label2<-as.yearmon(time(time_pred$mean))

time_label<-as.character(c(time_label1,time_label2))

TSplot_gen(48,modArima,X_new,title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series", #inculde labels inside
plot_labels = time_label)

Example two

A huge advantage of applying \code{TSplot_gen} function is that it can directly adding new time lines to the plot without calling another function \code{ADDline}.Here anothe plot of TCIU Figure 1.6 of Chapter 1 will be shown as an example:

# Step 1: creating the base plot
## Creating time labels
tl1<-as.yearmon(time(modArima_train$x))[(length(modArima_train$x)-48+1):length(modArima_train$x)]
tl2<-as.yearmon(time(forecast(modArima_train,xreg = as.matrix(X_test))$mean))
tl<-as.character(c(tl1,tl2))

Tempplot<-TSplot_gen(48,modArima_train,as.matrix(X_test),title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series")

# Show base plot if no other elements(labels, new time lines, etc)is included
Tempplot
# Step 2: including new lines and labels
## Creating list and other information for new lines
TSlist<-list(MCSI_Data_monthAvg_ts_Y_test)
TSlabel<-list(as.character(as.yearmon(time(TSlist[[1]]))))
TSname<-c("Original result")

## Put them into related parameters
TSplot_gen(48,modArima_train,as.matrix(X_test),title_size = 8,ts_original = "Original time series",
           ts_forecast = "Predicted time series",plot_labels = tl, #labels of original plot
            ts_list = TSlist,ts_names = TSname,ts_labels = TSlabel,COLO = "black")

Function ADDline

This function is set up to expand the functions of \code{TSplot} as it cannot draw new time lines by itself. Also, this funtion can also work for generating extra time lines to be applied into other \code{plot_ly} style variables. \code{ADDline} function create a list of 4 elements that can be applied quickly to functions \code{add_trace} or \code{add_lines}.

Below are parameters in this function:

Example of ADDline function

\code{ADDline} can collaborate with \code{TSplot} to expand its ability. Example below is based on those two functions and will produce same result with Example two of function \code{TSplot_gen}.

require(forecast)

#Firstly create a base plotly plot
Tempplot<-TSplot(48,modArima_train,as.matrix(X_test),title_size = 8,ts_original = "Original time series",
ts_forecast = "Predicted time series")

# Generate a new line with ADDline function
newline<-ADDline(TS = MCSI_Data_monthAvg_ts_Y_test,linetype = "TS",Name = "Original Result")

## Put the new line into our plot
Tempplot%>%
  add_lines(x=newline$X,text=newline$TEXT,y=newline$Y,name=newline$NAME,line=list(color="grey"))

Function GtoP_trans

Dataset that can be applied to package \code{ggplot2} is pretty different from that can be applied to \code{plotly} package. This function provieds a quick way to transfer data frame that works on \code{ggplot2} into the shape that will work on \code{plotly}. So that we can apply new dataset quickly to previous functions.

Belwo are parameters in this function:

Example of GtoP_trans

Firstly a ggplot2 example is shown here

ggplot(MCSI_Data_monthAvg_melt[MCSI_Data_monthAvg_melt$series!="INCOME", ],
       aes(YYYYMM, value)) +
  geom_line(aes(linetype=series, colour = series), size=2) +
  geom_point(aes(shape=series, colour = series), size=0.3) +
  geom_smooth(aes(colour = series), se = TRUE) +
  coord_trans(y="log10") +
  xlab("Time (monthly)") + ylab("Index Values (log-scale)") +
  scale_x_date(date_breaks = "12 month", date_labels =  "%m-%Y")  +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        text = element_text(size=20))+ theme(legend.position="top")

Data transformation by \code{GtoP_trans} is done here

PYdf<-GtoP_trans(MCSI_Data_monthAvg_melt[MCSI_Data_monthAvg_melt$series!="INCOME", ],NAME="series",X="YYYYMM",Y="value")
PYdf$INCOME<-NULL
#Log10 transformation
PYdf<-log10(PYdf)

Apply \code{plotly} package to create interactive plot

#Create an interactive list
updatemenus <- list(
  list(    
    xanchor="left",
    yanchor="top",
    active = -1,
    type= 'buttons',
    buttons = list(
      list(
        label = "ALL",
        method = "update",
        args = list(list(visible = c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE)),
                    list(title = "All Indexes"))),
      list(
        label = "ICS",
        method = "update",
        args = list(list(visible = c(FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
                    list(title = "ICS"))),
      list(
        label = "ICC",
        method = "update",
        args = list(list(visible = c(FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE)),
                    list(title = "ICC"))),
      list(
        label = "GOVT",
        method = "update",
        args = list(list(visible = c(FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE)),
                    list(title = "GOVT"))),
      list(
        label = "DUR",
        method = "update",
        args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE)),
                    list(title = "DUR"))),
      list(
        label = "HOM",
        method = "update",
        args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE)),
                    list(title = "HOM"))),
      list(
        label = "CAR",
        method = "update",
        args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)),
                    list(title = "CAR"))),
      list(
        label = "AGE",
        method = "update",
        args = list(list(visible = c(FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,TRUE)),
                    list(title = "AGE"))),
      list(
        label = "EDUC",
        method = "update",
        args = list(list(visible = c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE)),
                    list(title = "EDUC")))
      )
  )
)
# Apply plot_ly to finish generating result
plot_ly(type="scatter",mode="lines")%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$ICS,name="ICS",line=list(color="powderblue"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$ICC,name="ICC",line=list(color="red"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$GOVT,name="GOVT",line=list(color="green"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$DUR,name="DUR",line=list(color="orange"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$HOM,name="HOM",line=list(color="purple"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$CAR,name="CAR",line=list(color="pink"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$AGE,name="AGE",line=list(color="brown"))%>%
  add_lines(x=as.yearmon(rownames(PYdf)),text=rownames(PYdf),y=PYdf$EDUC,name="EDUC",line=list(color="black"))%>%
  layout(title= list(text="Time series for 8 indexes",font=list(family = "Times New Roman",size = 16,color = "black" )),
           paper_bgcolor='rgb(255,255,255)', plot_bgcolor='rgb(229,229,229)',
           xaxis = list(title ="Time (monthly)",
                        gridcolor = 'rgb(255,255,255)',
                        showgrid = TRUE,
                        showline = FALSE,
                        showticklabels = TRUE,
                        tickcolor = 'rgb(127,127,127)',
                        ticks = 'outside',
                        zeroline = FALSE),
           yaxis = list(title = "Index Values (log-scale)",
                        gridcolor = 'rgb(255,255,255)',
                        showgrid = TRUE,
                        showline = FALSE,
                        showticklabels = TRUE,
                        tickcolor = 'rgb(127,127,127)',
                        ticks = 'outside',
                        zeroline = FALSE),
         updatemenus=updatemenus)


Try the TSplotly package in your browser

Any scripts or data that you put into this service are public.

TSplotly documentation built on Aug. 2, 2019, 5:04 p.m.