devtools::load_all(".") library(sits)
Satellite image time series generally is contaminated by atmospheric influence, geolocation error, and directional effects [@Lambin2006]. Atmospheric noise, sun angle, interferences on observations or different equipment specifications, as well as the very nature of the climate-land dynamics can be sources of variability [@Atkinson2012]. Inter-annual climate variability also changes the phenological cycles of the vegetation, resulting in time series whose periods and intensities do not match on an year to year basis. To make the best use of available satellite data archives, methods for satellite image time series analysis need to deal with noisy and non-homogeneous data sets. In this vignette, we discuss filtering techniques to improve time series data that present missing values or noise.
The literature on satellite image time series have several applications of filtering to correct or smooth vegetation index data. The sits have support for Savitzky–Golay (sits_sgolay()), Whitaker (sits_whittaker()), envelope (sits_envelope()) and, the "cloud filter" based on ARIMA methods (sits_ndvi_arima_filter()) filters. The first two filters are commonly used in the literature, while the remaining two have been developed by the authors.
Various somewhat conflicting results have been expressed in relation to the time series filtering techniques for phenology applications. For example, in an investigation of phenological parameter estimation, @Atkinson2012 found that the Whittaker and Fourier transform approaches were preferable to the double logistic and asymmetric Gaussian models. They applied the filters to preprocess MERIS NDVI time series for estimating phenological parameters in India. Comparing the same filters as in the previous work, @Shao2016 found that only Fourier transform and Whittaker techniques improved interclass separability for crop classes and significantly improved overall classification accuracy. The authors used MODIS NDVI time series from the Great Lakes region in North America. @Zhou2016 found that asymmetric Gaussian model outperforms other filters over high latitude boreal biomes, while the Savitzky-Golay model gives the best reconstruction performance in tropical evergreen broadleaf forests. In the remaining biomes, Whittaker gives superior results. The authors compare all previous mentioned filters plus Savitzky-Golay method for noise removal in MODIS NDVI data from sites spread worldwide in different climatological conditions. Many other techniques can be found in applications of satellite image time series such as curve fitting [@Bradley2007], wavelet decomposition [@Sakamoto2005], mean-value iteration, ARMD3-ARMA5, and 4253H [@Hird2009]. Therefore, any comparative analysis of smoothing algorithms depends on the adopted performance measurement.
One of the main uses of time series filtering is to reduce the noise and miss data produced by clouds in tropical areas. The following examples use data produced by the PRODES project [@INPE2017], which detects deforestation in the Brazilian Amazon rain forest through visual interpretation. This data set is called prodes_226_04 and is provided together with the sits package. It has $617$ samples from a region corresponding to the standard Landsat Path/Row 226/064. This is an area in the East of the Brazilian Pará state. It was chosen because of its huge cloud cover from November to March, which is a significant factor in degrading time series quality. Its NDVI and EVI time series were extracted from a combination of MOD13Q1 and Landsat8 images (to best visualize the effects of each filter, we selected only NDVI time series).
The SITS package uses a common interface to all filter functions with the sits_filter. The function has two parameters: data for the dataset to be filtered and filter for the filter to be applied. To aid on data visualisation, all bands which are filtered have a suffix which is appended, as shown in the examples below. The following filters are available and described in what follows:
sits_sgolay)sits_whittaker)sits_envelope)sits_ndvi_arima_filter)sits_cloud_filter)sits_kalman)The Savitzky-Golay filter works by fitting a successive array of $2n+1$ adjacent data points with a $d$-degree polynomial through linear least squares. The central point $i$ of the window array assumes the value of the interpolated polynomial. An equivalent and much faster solution than this convolution procedure is given by the closed expression $$ {\hat{x}{i}=\sum {j=-n}^{n}C_{j}\,x_{i+j}}, $$ where $\hat{x}$ is the the filtered time series, $C_{j}$ are the Savitzky-Golay smoothing coefficients, and $x$ is the original time series.
The coefficients $C_{j}$ depend uniquely on the polynomial degree ($d$) and the length of the window data points (given by parameter $n$). If ${d=0}$, the coefficients are constants ${C_{j}=1/(2n+1)}$ and the Savitzky-Golay filter will be equivalent to moving average filter. When the time series is equally spaced, the coefficients have analytical solution. According to @Madden1978, for ${d\in{}[2,3]}$ each $C_{j}$ smoothing coefficients can be obtained by $$ C_{j}=\frac{3(3n^2+3n-1-5j^2)}{(2n+3)(2n+1)(2n-1)}. $$
In general, the Savitzky-Golay filter produces smoother results for a larger value of $n$ and/or a smaller value of $d$ [@Chen2004]. The optimal value for these two parameters can vary from case to case. In SITS, the user can set the order of the polynomial using the parameter order (default = 3), the size of the temporal window with the parameter length (default = 5), and the temporal expansion with the parameter scaling (default = 1). The following example shows the effect of Savitsky-Golay filter on the original time series.
# Take NDVI band of the first sample data set point.tb <- sits_select(prodes_226_064[1,], "ndvi") # apply Savitzky–Golay filter point_sg.tb <- sits_filter(point.tb, filter = sits_sgolay()) # plot the series sits_merge(point_sg.tb, point.tb) %>% plot()
The Whittaker smoother attempts to fit a curve that represents the raw data, but is penalized if subsequent points vary too much [@Atzberger2011]. The Whittaker filter is a balancing between the residual to the original data and the "smoothness" of the fitted curve. The residual, as measured by the sum of squares of all $n$ time series points deviations, is given by $$ RSS=\sum_{i}(x_{i} - \hat{x_{i}})^2, $$ where $x$ and $\hat{x}$ are the original and the filtered time series vectors, respectively. The smoothness is assumed to be the measure of the the sum of the squares of the third order differences of the time series [@Whittaker1922] which is given by $$ \begin{split} S!S!D = (\hat{x}4 - 3\hat{x}_3 + 3\hat{x}_2 - \hat{x}_1)^2 + (\hat{x}_5 - 3\hat{x}_4 + 3\hat{x}_3 - \hat{x}_2)^2 \ + \ldots + (\hat{x}_n - 3\hat{x}{n-1} + 3\hat{x}{n-2} - \hat{x}{n-3})^2. \end{split} $$
The filter is obtained by finding a new time series $\hat{x}$ whose points minimize the expression $$ RSS+\lambda{}S!S!D, $$ where $\lambda{}$, a scalar, works as an "smoothing weight" parameter. The minimization can be obtained by differentiating the expression with respect to $\hat{x}$ and equating it to zero. The solution of the resulting linear system of equations gives the filtered time series which, in matrix form, can be expressed as $$ \hat{x} = ({\rm I} + \lambda {D}^{\intercal} D)^{-1}x, $$ where ${\rm I}$ is the identity matrix and $$ D = \left[\begin{array}{ccccccc} 1 & -3 & 3 & -1 & 0 & 0 &\cdots \ 0 & 1 & -3 & 3 & -1 & 0 &\cdots \ 0 & 0 & 1 & -3 & 3 & -1 & \cdots \ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \end{array} \right] $$ is the third order difference matrix. The Whitakker filter can be a large but sparse optimization problem, as we can note from $D$ matrix.
Whittaker smoother has been used only recently in satellite image time series investigations. According to @Atzberger2011, the smoother has an advantage over other filtering techniques such as Fourier and wavelets as it does not assume signal periodicity. Moreover, the authors argue that is enables rapid processing of large amounts of data, and handles incomplete time series with missing values.
In the sits package, the Whittaker smoother has two parameters: lambda controls the degree of smoothing and differences the order of the finite difference penalty. The default values are lambda = 1 and differences = 3. Users should be aware that increasing lambda results in much smoother data. When dealing with land use/land cover classes that include both natural vegetation and agriculture, a strong smoothing can reduce the amount of noise in natural vegetation (e.g., forest) time series; however, higher values of lambda reduce the information present in agricultural time series, since they reduce the peak values of crop plantations.
# filter NDVI band point_ndvi <- sits_select(point_mt_6bands, bands = "NDVI") # apply Whitaker filter to an NDVI time series from 2000 to 2016 # merge with the original data # plot the original and the modified series point_whit <- sits_filter(point_ndvi, sits_whittaker(lambda = 3)) point_whit %>% sits_merge(point_ndvi) %>% plot()
This filter can generate a time series corresponding to the superior (inferior) bounding of the input signal. This is accomplished through a convoluting window (odd length) that attributes to the point $i$, in the resulting time series, the maximum (minimum) value of the points in the window. The $i$ point corresponds to the central point of the window. It can be defined as $$ u_i=\max_{k}{({x_{k}:\left|i-k\right|\le{}1})}, $$ whereas an lower dilation is obtained by $$ l_i=\min_{k}{({x_{k}:\left|i-k\right|\le{}1})}. $$ Here, $x$ is the input time series and, $k$ and $i$ are vector indices.
The sits_envelope() can combine both maximum and minimum window sequentially. The function can receive a string sequence with "U" (for maximization) and "L" (for minimization) characters passed to its parameter. A repeated sequence of the same character is equivalent to one operation with a larger window. The sequential operations on the input time series produces the final filtered result that is returned.
The envelope filter can be viewed through mathematical morphology lenses, a very common field in digital image processing [@Haralick1987]. Here the operations of "U" and "L" corresponds to the dilation and erosion morphological operators applied to univariate arrays [@Vavra2004]. Furthermore, the compounds operation of opening and closing can be obtained by "UL" and "LU", respectively. This technique has been applied on time series analysis in other fields [@Accardo1997] but, for our knowledge, there is no application in satellite image time series literature.
In the following example we can see an application of sits_envelope() function. There, we performs the opening filtration and closing filtration introduced by @Vavra2004. The correspondent operations sequence are "ULLULUUL" and "LUULULLU".
# Take the NDVI band of the first sample data set point.tb <- sits_select(prodes_226_064[1,], "ndvi") # apply envelope filter (remove downward and upward noises) point_env1.tb <- sits_filter(point.tb, sits_envelope(operations = "ULLULUUL", bands_suffix = "OF")) point_env2.tb <- sits_filter(point.tb, sits_envelope(operations = "LUULULLU", bands_suffix = "CF")) # plot the series sits_merge(point_env1.tb, point_env2.tb) %>% sits_merge(point.tb) %>% plot()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.