knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning=FALSE, message=FALSE )
library(TSAIB)
\newpage
The software package TSAIB was developed as the main deliverable for a synthesis project regarding the study-line: Earth and Space Physics and Engineering, with a specialization in Earth Observation, at the Technical University of Denmark.
The software is implemented in the open source language "R" which can be found at https://www.r-project.org/. The source code to the "R" software package TSAIB is located at GitHub at https://github.com/MathiasVendt/TSAIB
The easiest way to install the package is directly from Github by using the R devtools package. The R-version should be 2.10 or higher.
install.packages('devtools')
devtools::install_github("MathiasVendt/TSAIB")
Or by using the \verb+remotes+ library like this:
library(remotes)
install_github("MathiasVendt/TSAIB")
The packages depends on the following R libraries which can be installed from R with the function \verb+install.packages+. Hence, to install the package "ncdf4"use the following command for the R window:
install.packages("ncdf4")
A special library is also needed, and will be mentioned later in this tutorial
Together with a compiler tool, \verb+RTools+. This can be installed by going to https://cran.r-project.org/, click on 'Download R for Windows', then 'Rtools', and select the latest version of \verb+RTools+ to download. After downloading has completed run the installer with default settings.
The section gives a short introduction to R, which is useful to new R users.
R tutorials can be found at the r-project web side https://cran.r-project.org/manuals.html
Help pages can be accessed by typing "?" in front of a given function. If we want to access the help for the function \verb+sum+ we write
?sum
To start the web based help interface
help.start()
To exit R write
q()
TSAIB is a R package aimed at getting statistical insight, correct for intermission bias and performing time series analysis of satellite observations consisting of sea-level anomalies measured in the arctic ocean. A variety of functions is provided in order to determine a time series model based on the analyses and insights from the analysis functions. In this introduction, examples are given to illustrate how to use the package.
To load the package simply write:
library(TSAIB)
This section gives a step by step guide on how to utilize the package for analyzing the data, and the order in which the functions are used are recommended.
The test data are included when the package is downloaded. When the TSAIB library is loaded, info about the data set can be found by the command, and in order to get information about the data set \verb+TSdata+, simply write:
?TSdata
In order to load the data into the R environment, the data set is assigned a name, e.g. "TSdata" like this:
TSdata=TSdata
The included data \verb+TSdata+ is a list containing 5 elements: Longitudes, latitudes, dates, measurements and a 3x3x325 construct, containing 325 measurements from a 3x3 data point grid. The data is extracted with the \verb+GridTSExtract+ function from the \verb+TSAIB+ library, and is extracted as:
TSdata=GridTSExtract(nco)
Where \verb+nco+ is the NetCDF data from which \verb+TSdata+ is extracted, and documentation on the data is listed in the Data appendix.
The user can also import their own data from a NetCDF file using the \verb+GridTSExtract+ function, and more info on how to use the function on a specific NetCDF file is shown by:
?GridTSExtract
It is now desired to get a brief overview of the data, including statistics, distributions and plotting. This can be achieved using the \verb+TSdiagnostics+ function. For this example, a meaned 30 point radius grid area (\verb+TS_area+) is analyzed, which is centered around (lon,lat)=(0,73.25). To access sub elements of a construct in R, use \verb+$+ as a seperator between the parent object, and it's sub-directory. The function wont work for NaN elements, and if the extracted data contains a few of those, the \verb+nanrem+ option in the \verb+TSdiagnostics+ function is set to \verb+"TRUE"+, which will omit any NaN elements for the analysis.
TSdiagnostics(TS_area)
If it is decided to remove NaN objects, the function \verb+TSnanrem+ can be used to do this, with a variety of options of how to deal with NaN. For this example, the NaN's of \verb+TSmatrix+ are to be replaced with the mean of the time series:
TSnanrem(TSdata$TSmatrix, method = "mean")
Some times, the data consists of measurements from different satellite missions, and these measurements will have different biases and needs to be aligned. This calls for correction of intermission biases, and if the bias is known, it can be accounted for using the \verb+IB+ function. The input to this function is a time series struct \verb+TSM+, a date vector, a bias vector containing up to two different biases \verb+c(bias1,bias2)+ and lastly, a bias index struct, with the same dimensions as \verb+TSM+, containing zeros for the restrained time series, and integers 1 & 2, for bias1 and bias2 respectively. In the following example, the same 3x3x325 time series as used before, has been added a bias of \verb+c(1,-2)+, and is called \verb+TimeSeriesMatrixBias+, with the corresponding bias index struct \verb+biasid+.
IB(TimeSeriesMatrixBias,TSdata$date,c(1,-2),biasid)
It is possible to get an insight of the time series with the \verb+TSAplots+ function. This function utilizes R's inbuilt differencing function \verb+diff+, auto correlation function \verb+acf+ and partial auto correlation function \verb+pacf+, to display a collection of plots to determine the time series characteristics. It is possible to choose a transformation of the data, being either \verb+sqrt+ or \verb+log+ transformations, and even to save the plots in a variety of formats. The default settings is used, which doesn't transform the data or save the plots:
TSAplots(TS_area,TSdata$date)
A "tail-off" decay is observed in "ACF of time series", indicating an AR(p) part, and a cut-off is observed at lag 3 on "PACF of time series", indicating an AR(p=3) part could be reasonably assumed. The second row of plots show the ACF and PACF of the differenced time series (differenced once!), and the slow decay in the first ACF plot is clearly removed. Differencing once (d=1) is then a reasonable assumption. After differencing, a positive spike is seen at lag 12 on the "ACF of the differenced (once) time series"-plot, which could indicate a seasonality of 12 (S=12). This assumption is likely to be true, as the data used for this example is monthly observations of sea-level anomalies.
To estimate a model which is describing the time series at hand, the \verb+TSAIB+ package containes several functions with different approches:
From the time series analysis functions, some indications of adequate parameter estimates for ARIMA models may be present, and if so, the \verb+ARIMAbuilder+ function could be useful. The inputs are the maximum number of p,q,P,Q parts in the arima model, and a fixed level of differencing: d and D. The output will display the top 5 best models, based on the Bayesian Information Criteria (BIC). BIC is chosen as it, according to [@Madsen2008], yields a consistent estimate of the model order, and is given by: \begin{eqnarray} \text{BIC}=N\text{log}\hat{\sigma}^2_\epsilon+\text{log}(N)(p+q+P+Q+d+D) \end{eqnarray} with $N$ being the number of observations in the data set. By default, all parameters are set to 1, with seasonality set to 12 (i.e.: ARIMA(1,1,1)x(1,1,1)_12). For this example, however, the reasonable parameter guesses from \verb+TSAplots+ are used: ARIMA(p=3,d=1,q=?)x(P=?,D=?,Q=?)_S=12. Estimating ARIMA models is not an exact science, and ACF and PACF plots on real time series data is sometimes ambiguous. Hence the missing parameters are to be set to a reasonable guess of the maximum parameter value. The higher the parameter range, the more processing is needed to run \verb+ARIMAbuilder+. To make the processing relatively fast, the remaining parameter estimates are set to 1: ARIMA(3,1,1,1,1,1)_12. Below is shown the iterations of model variations that has been evaluated, followed by a plot of the estimated best model on top of the measurements, and lastly a list of the top 5 best estimated ARIMA models:
ARIMAbuilder(TS_area,TSdata$date,3,1,1,1,1,1,12)
It is seen that ARIMA(3,1,1)x(1,1,1)_12 is on third place, ARIMA(2,1,1)x(1,1,1)_12 is on second place, and the best model based on BIC is: ARIMA(1,1,1)x(1,1,1)_12, and that is shown in the figure above. \verb+ARIMAbuilder+ also takes the last 10% of the observations, and uses it as test data, on which it forecasts the remaining 10% together with a 95% CI for the forecastet predictions.
The function \verb+TSSmodel+ uses Ordinary Least Squares (OLS) to estimate a simple model of the form: \begin{eqnarray} Y_t=\alpha+\beta_tt+\beta_s\sin(\frac{2\pi}{p}t)+\beta_c\cos(\frac{2\pi}{p}t) \end{eqnarray} where the alpha and the betas are the estimated parameters, and the period (p) and timestep (t) is set to 12 and 1 respectively by default:
TSSmodel(TS_area,TSdata$date)
TMB stands for "Template Model Builder", and is a C++ interface for automatic differentiation that implements Laplace's method to estimate a time series model. For more info on TMB and how to install the package, please visit https://github.com/kaskr/adcomp.
Selected TMB functions have been implemented into TSAIB, in order to efficiently estimate a Random Walk (RW) model, based on likelihood functions. These TMB functions are gathered in one TSAIB function, called \verb+TMBTSA+. This function has three inputs: TSM (a struct with time series data), date (the usual vector containing time stamps) and biasid (a struct with the same dimensions as TSM, containing the ones where the biased data is). \verb+TMBTSA+ can handle one bias at a time, and doesn't need to know the exact bias value, in contrast to the \verb+IB+ function, as it will estimate constant bias in the process. Furthermore, TSM doesn't have to be corrected for NaN, as \verb+TMBTSA+ will handle those as well.
For the next example, a synthetic time series struct \verb+TimeSeriesMatrixBiasTMB+ is used. As with the demonstration for the \verb+IB+ function, this is an altered version of \verb+TSdata$TSmatrix+, with a single bias added this time. An illustration of the synthetic data is shown below:
plot(TSdata$date,rep(0,length(TSdata$date)),xlab="Date",ylab="Measurement",main='TS with bias',col="white",ylim = c(min(TimeSeriesMatrixBiasTMB,na.rm = TRUE),max(TimeSeriesMatrixBiasTMB,na.rm = TRUE))) for (j in 1:dim(TSdata$TSmatrix)[2]) { for (k in 1:dim(TSdata$TSmatrix)[1]) { points(TSdata$date,TimeSeriesMatrixBiasTMB[k,j,],col="black") } }
The bias index that matches \verb+TimeSeriesMatrixBiasTMB+ is \verb+biasidTMB+. Together with the date vector \verb+TSdata$date+, \verb+TimeSeriesMatrixBiasTMB+ and \verb+biasidTMB+ are inserted into \verb+TMBTSA+ as so:
TMBTSA(TimeSeriesMatrixBiasTMB,TSdata$date,biasidTMB)
The iterative messages from the process has been left out, as it's excessive information for this tutorial. The estimated RW model is shown below:
plot(TSdata$date,rep(0,length(TSdata$date)),xlab="Date",ylab="Measurement",main='Estimated RW', col="white",ylim = c(min(DEMOpllam- DEMOtval*DEMOplsdlam),max(DEMOpllam+ DEMOtval*DEMOplsdlam))) lines(TSdata$date,DEMOpllam,col="red") lines(TSdata$date,DEMOpllam+ DEMOtval*DEMOplsdlam,col="green") lines(TSdata$date,DEMOpllam- DEMOtval*DEMOplsdlam,col="green") legend(TSdata$date[round(2.2*length(TSdata$date)/3)],min(TSdata$TSmatrix,na.rm = TRUE)/2, legend=c( "Estimates","95% CI"), col=c( "red","green"), lty=c(1,1), cex=0.8)
To compare the estimates with the data without bias, \verb+TSdata$TSmatrix+ is plottet together with the estimates below.
plot(TSdata$date,rep(0,length(TSdata$date)),xlab="Date",ylab="Measurement",main='TS without bias + Estimated RW',col="white",ylim = c(min(TSdata$TSmatrix,na.rm = TRUE),max(TSdata$TSmatrix,na.rm = TRUE))) for (j in 1:dim(TSdata$TSmatrix)[2]) { for (k in 1:dim(TSdata$TSmatrix)[1]) { points(TSdata$date,TSdata$TSmatrix[k,j,],col="black") } } lines(TSdata$date,DEMOpllam,col="red") lines(TSdata$date,DEMOpllam+ DEMOtval*DEMOplsdlam,col="green") lines(TSdata$date,DEMOpllam- DEMOtval*DEMOplsdlam,col="green") legend(TSdata$date[round(2.2*length(TSdata$date)/3)],min(TSdata$TSmatrix,na.rm = TRUE)/2, legend=c("Measurements", "Estimates","95% CI"),col=c("black", "red","green"), lty=c(1,1,1), cex=0.8)
\verb+TMBTSA+ assumes the data to be independent and uncorrelated, and since the test data used in this example is satellite measurements of a certain area, there might be spacial correlation between the observations per time stamp. This impacts the calculation of the standard deviation, as the function interprets this as if it has more information about the data than it actually has, and it can be seen in the figures above, where the 95% CI lies extremely close to the estimates, making the estimates hard to distinguish from the CI.
This tutorial has presented the R package \verb+TSAIB+, as well as demonstrated how to install, load, and utilize the inbuilt functions on a specific data set of satellite measurements of sea-level anomalies in the arctic ocean. \verb+GridTSExtract was used to extract the \verb+nco+ object from a NetCDF file. Basic statistical insights was achieved through the \verb+TSdiagnostics+ function, on a meaned area of interest called \verb+TS_area+. To manipulate NaN values in the data, a function was introduced to deal with this called \verb+TSnanrem+. If the data contained known intermission bias, the function \verb+IB+ could correct the data for up to two different biases, which was demonstrated on synthetic biased data called \verb+TimeSeriesMatrixBias+, with the matching bias struct \verb+biasid+. The ACF and PACF of \verb+TS_area+ was plottet for the raw data, as well as for the differenced time series, through the \verb+TSAplots+ function, from which it was demonstrated how to estimate a few ARIMA parameters.
The next stage was "Model Estimation", where the \verb+ARIMAbuilder+ function was presented, and it was demonstrated how the parameter estimates from \verb+TSAplots+ could be used to restrain some of the ARIMA(p,d,q)x(P,D,Q)_S parameters, in order for the function to iteratively scan through qualitative parameter ranges, and point out the top 5 best models, and plot the best of all the possible ARIMA models based upon the BIC. Another introduced model estimation function was \verb+TSSmodel+, which estimates a simple time series model using OLS, with an intercept, a trend, a sinus and a cosinus term, and was also demonstrated on the \verb+TS_area+ data. Lastly, the \verb+TMBTSA+ function was demonstrated, which utilizes TMB, a C++ interface, to estimate a RW model, take care of NaN, and also account for one intermission bias all at once. This was done on synthetic data containing NaN and one intermission bias called \verb+TimeSeriesMatrixBiasTMB+ with the corresponding bias struct called \verb+biasidTMB+.
\verb+TSAIB+ successfully implements analysis and model estimation functions on the specific data set provided. It is not certain that the data from a NetCDF file is structured the same way, and in order to match with other data sets than what is provided in the package, the user is required to have some basic knowledge of how to manipulate the data in R in order for it to fit the functions used in \verb+TSAIB+.
The functions, unless described otherwise, are sensitive to NaN values, for which an extra function is provided to take care of this before inserting the data into the functions. This could be implementet in more advanced functions, so that pre-processing of the data isn't necessary.
For the R functions that are demonstrated using the \verb+TS_area+ data set, only a vector of time series data can be used as input, where a more advanced function could take into account a grid of values for each time step, as it is done in the \verb+TMBTSA+ function.
The \verb+ARIMAbuilder+ function can take any range of parameters as input, and if the range is set too high, and there aren't enough range-restrained parameters, the processing of this function can be very time consuming, and if the user isn't careful, this process could uptake all the computer's memory. An improvement on this would be a security measure which halts the process if this limit is approaching.
\verb+TSSmodel+ estimates a very simple and specific model, and it was seen that the simple sinus and cosinus terms wasn't matching the data very well. This function could be further developed featuring a number of different simple linear models, from which the user could choose the most adequate.
The implementation of \verb+TMB+ in the \verb+TMBTSA+ function is very sparse compared to the enormous potential and amount of tools \verb+TMB+ has to offer. The function could have several different tools combined into the function, from which the user would be able to choose one of these. The function doesn't take correlated data into account, resulting in wrong calculations of the standard deviation if the data is correlated. The wrong calculation of the standard deviation when the function is used on a grid of correlated data per time stamp, could also be fixed, so that the function would take correlation of the data points into account. It also complicates the use of \verb+TSAIB+ by having to install \verb+TMB+ prior to \verb+TSAIB+, but compared to the possibilities of using \verb+TMB+, this is a relatively small price to pay.
All in all \verb+TSAIB+ provides a simple overview, and relatively simple-to-use functions for estimating time series models on satellite data of sea-level anomalies provided in this package, but the functions could be developed to be more versatile, robust and user friendly when used on other data.
TSdata
?TSdata Description An extracted list using the "GridTSExtract" function, from the data set: Rose, S.K.; Andersen, O.B.; Passaro, M.; Ludwigsen, C.A.; Schwatke, C. Arctic Ocean Sea Level Record from the Complete Radar Altimetry Era: 1991-2018. Remote Sens. 2019, 11, 1672. https://www.mdpi.com/2072-4292/11/14/1672 Usage TSdata Format A large list containing 5 elements, which are: longitude: containing longitudes from -180:180, by increments of 0.5 degrees (length: 720) latitude: containing latitudes from 65:81.5, by increments of 0.25 degrees (length: 67) date: containing years from 1991:2019, by increments of 1/12 (length:325) measurements: containing sea_level_anomaly measurements for each increments of: longitude, latitude and date. (dimension: [67,720,325]) TSmatrix: containing a 3x3x325 grid of measurements centered around (lon,lat)=(-165,74)
TSdata2
?TSdata2 Description An extracted list using the "GridTSExtract" function, from the data set: Rose, S.K.; Andersen, O.B.; Passaro, M.; Ludwigsen, C.A.; Schwatke, C. Arctic Ocean Sea Level Record from the Complete Radar Altimetry Era: 1991-2018. Remote Sens. 2019, 11, 1672. https://www.mdpi.com/2072-4292/11/14/1672 Usage TSdata2 Format A large list containing 5 elements, which are: longitude: Containing longitudes from -180:180, by increments of 0.5 degrees (length: 720) latitude: Containing latitudes from 65:81.5, by increments of 0.25 degrees (length: 67) date: Containing years from 1991:2019, by increments of 1/12 (length:325) measurements: Containing sea_level_anomaly measurements for each increments of: longitude, latitude and date. (dimension: [67,720,325]) TSmatrix: Containing a 61x61x325 grid of measurements centered around (lon,lat)=(0,73)
TS_area
?TS_area Description A time series containing the meaned and NaN removed time series, from using the "TSnanrem" function on "TSdata$TSmatrix" Usage TS_area Format A numerical vector with length: 325
TimeSeriesMatrixBias
?TimeSeriesMatrixBias Description A time series matrix with added biases: 1 and -2, on the data: "TSdata$TSmatrix" Usage TimeSeriesMatrixBias Format A numerical struct with dimensions 3x3x325
biasid
?biasid Description A bias index containing bias indicies: 1 and 2, matching the synthetic bias data "TimeSeriesMatrixBias" Usage biasid Format A numerical struct with dimensions 3x3x325
TimeSeriesMatrixBiasTMB
?TimeSeriesMatrixBias Description A time series matrix with added biases: 1, on the data: "TSdata$TSmatrix" Usage TimeSeriesMatrixBiasTMB Format A numerical struct with dimensions 3x3x325
biasidTMB
?biasidTMB Description A bias index containing bias index: 1, matching the synthetic bias data "TimeSeriesMatrixBiasTMB" Usage biasidTMB Format A numerical struct with dimensions 3x3x325
DEMOpllam
?DEMOpllam Description RW model estimates, calculated using the TMBTSA function on "TimeSeriesMatricBiasTMB" and "biasidTMB" Usage DEMOpllam Format A numerical vector with length: 325
DEMOplsdlam
?DEMOplsdlam Description RW model standard deviations, calculated using the "TMBTSA" function on "TimeSeriesMatricBiasTMB" and "biasidTMB" Usage DEMOplsdlam Format A numerical vector with length: 325
DEMOtval
?DEMOtval Description T-value for the 95% confidence interval for the TMBTSA function Usage DEMOtval Format A numerical value
GridTSExtract
?GridTSExtract Description Extracts a time series from a NetCDF lon&lat coordinate grid Usage GridTSExtract( nco, lonid = "longitude", latid = "latitude", timeid = "date", measurementsid = "sea_level_anomaly", coord = c(-165, 74), radius = 0, dlon = 2, dlat = 4 ) Arguments nco: The NetCDF object (open with "nc_open" from the "ncdf4" library) lonid: The variable id for longitude vector. e.g.: "longitude" latid: The variable id for latitude vector. e.g.: "latitude" timeid: The variable id for the time vector. e.g.: "date" measurementsid: The variable id for the measurements vector. e.g.: "sea_level_anomaly" coord: The center coordinate for the TS grid area. e.g.: c(lon,lat) i.e. c(-164,74) radius: The square radius of the TS grid. e.g: 0=1x1 grid, 1=3x3 grid, 2=5x5 grid etc. dlon: Number of data points for each degree longitude dlat:Number of data points for each degree latitude Value Time series from grid
TSdiagnostics
?TSdiagnostics Description Shows basic statistics and characteristics of the time series data Usage TSdiagnostics(TS, nanrem = "FALSE") Arguments TS:The time series to be analyzed nanrem: Set NaN to be removed or not. e.g.: nanrem="TRUE" Value Basic statistics and characteristics
TSnanrem
?TSnanrem Description Removes or imputes NaN from the time series data set, with a specified method of choice Usage TSnanrem(TS, method = "mean", value_nr = 0) Arguments TS: The time series to remove NaN from method: The imputation method. e.g.: "omit", "mean", "median" or "value" value_nr: The specific value for the "value" imputing method. e.g.: value_nr=0 Value Time series with NaNs removed or replaced
IB
?IB Description Corrects a struct of timeseries for up to two intermission biases Usage IB(TSM, date, biasvec = c(1, -2), biasid) Arguments TSM: The time series struct with the intermission bias date: The time axis in the time series (dates, seconds, intervals) biasvec: The vector containging up to 2 different bias. If the data only contains one7 bias, insert that value instead of the vector biasid: A struct with the same dimensions as TSM, with integers indicating the bias: 0 for the restrained time series, 1 for the first bias and 2 for the second bias. Value Time series struct with bias correction
TSAplots
?TSAplots Description Plots and saves the ACF and PACF for a pre-defined time series, with and without data transformations (log, sqrt), and differencing. Input type: Time series Usage TSAplots(TS, date, trans = "NONE", saveas = "NONE") Arguments TS: The time series for the TSAplots function date: The time axis in the time series (dates, seconds, intervals) trans: data transform parameter: trans='log' for log transform, trans='sqrt' for sqrt transform, and trans='NONE' for no transform saveas: specifies the file format to save the plots in the current working directory. Possible formats is: PDF, JPEG, TIFF, BMP and PNG. e.g.: "pdf" or "jpeg". If no format is specified, it will not save the plots Value Plots from TSAplots
TSSmodel
?TSSmodel Description Estimates a simple time series model Usage TSSmodel(TS, date, p = 12, t = 1, testsize = round(length(date)/10)) Arguments TS: The time series for the TSSmodel function date: The time axis in the time series (dates, seconds, intervals) p: The period. e.g.: p=12, for 12 months in a year t: The time steps. e.g.: t=1, for monthly time steps, with period 12 testsize: The number of data points to be used for testing the model Value Simple TS model
ARIMAbuilder
?ARIMAbuilder Description Lists the top 5 estimated ARIMA/SARIMA models of the form: ARIMA(p,d,q)x(P,D,Q), based on specified parameter values and ranges Usage ARIMAbuilder(TS, p = 1, d = 1, q = 1, P = 1, D = 1, Q = 1, S = 12) Arguments TS: The time series to build the models upon (remember to transform before using this function!) p: The maximum AR value, estimated from plot analysis d: The fixed differencing, often 0, 1 or 2 q: The maximum MA value, estimated from plot analysis P: The maximum seasonal AR value, estimated from plot analysis D: The fixed seasonal differencing, often 0, 1 or 2 Q: The maximum seasonal MA value, estimated from plot analysis S: The seasonality/period of the seasonal differencing Value List of top 5 ARIMA/SARIMA models
TMBTSA
?TMBTSA Description Utilizes TMB, which is a C++ interface for automatic differentiation that implements the Laplace method, to estimate a time series model Usage TMBTSA(TSM, date, biasid) Arguments TSM: The time series struct, with dimensions (x/lon,y/lat,timesteps) date: The time axis in the time series (i.e., dates, seconds, intervals) biasid: A struct with the same dimensions as TSM, with integers indicating the bias: 0 for the restrained time series and 1 for the bias. If no bias is present, biasid contains only zeros. Value RW model
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.