WTSS - R interface to Web Time Series Service

knitr::opts_chunk$set(echo = TRUE)


Recently, the concept of data cubes has emerged as a relevant paradigm for organising Earth observation data for performing analysis. In an intuitive sense, data cubes are temporal sequences of images from the same geographical area. This sequence is organised consistently to allow algorithms to explore the data in both the spatial and temporal directions[Appel2019]. Data cubes are thus an efficient way to explore satellite image archives spanning years or even decades.

Data cubes rely on the fact that Earth observation satellites revisit the same place at regular intervals. Thus measures can be calibrated so that observations of the same place in different times are comparable. These calibrated observations can be organised in regular intervals, so that each measure from sensor is mapped into a three dimensional multivariate array in space-time. Let $S = {s_{1}, s_{2},\dotsc, s_{n}}$ be a set of remote sensing images which shows the same region at \emph{n} consecutive times $T = {t_{1}, t_{2}, \dotsc, t_{n}}$. Each location \emph{[x, y, t]} of a pixel in an image (latitude, longitude, time) maps to a \emph{[i, j, k]} position in a 3D array. Each array position \emph{[i, j, k]} will be associated to a set of attributes values $A = {a_{1}, a_{2}, \dotsc, a_{m}}$ which are the sensor measurements at each location in space-time (see figure below}). For optical sensors, these observations are proportional to Earth's reflexion of the incoming solar radiation at different wavelengths of the electromagnetic spectrum.

knitr::include_graphics(system.file("extdata/markdown/figures", "arrays.jpg", package = "wtss"))

In what follows, we use the term "coverage" in the same sense as the definition of a "data cube", proposed from Appel and Pebesma [@Appel2019]: A regular, dense Earth observation (EO) data cube is a four-dimensional array with dimensions x (longitude or easting), y (latitude or northing), and time, with the following properties:

  1. Spatial dimensions refer to a single spatial reference system (SRS)};
  2. Cells of a data cube have a constant spatial size (with regard to the cube's SRS)};
  3. The spatial reference is defined by a simple offset and the cell size per axis, i.e., the cube axes are aligned with the SRS axes;}
  4. The temporal reference is a known set of non-overlapping temporal intervals; each temporal interval is defined by a simple start date/time and the temporal duration of cells;}
  5. All cells have the same set of attributes; each attribute can be a scalar or a vector;}
  6. For every combination of dimensions, a cell has a single set of attribute values.}

This definition of EO data cubes reflects their construction. Data cubes are built of 2D images collected on a specific date constrained by satellite orbits. These images are then grouped together in arbitrary intervals. In principle, a data cube can have an image on time 1, another image at time 5, another at time 8, and so on. Since interpolation in space is different from interpolation in time, a data set composed by these images would make a valid data cube. Space needs to be compact (dense) in a data cube, but time does not need to be. Thus, a sequence of 2D + time images need not be dense. 2D images in a data cube can correspond to different instants of acquisition.

Web Services for Earth Observation Data Cubes

Web services interfaces isolate users from concrete implementations, thus increasing portability There are many ways to implement Earth observation data cubes. One of the earlier approaches is to break large images in chunks and store them in an object-relational DBMS, as done in RASDAMAN [@Baumann1998]. Google Earth Engine[@Gorelick2017] uses a massive parallel design; image data is partitioned in hundreds or thousands of CPUs, processed separately and later aggregated using map-reduce techniques. The Open Data Cube software, after making the required geometric and radiometric transformations in the images, stores them in files, which are indexed in a PostgreSQL database [@Lewis2017]. Similar file-based approaches are used in the R package "gdalcubes" [@Appel2019]. Each of these designs is associated to a different API, thus making compatibility and interoperability between them hard to achieve.

Given the different designs of Earth observation data cubes, and their inherent limitations for achieving interoperability, one natural approach is to propose Web services that would present a unified interface for accessing them. Web services interfaces isolate users from concrete implementations, thus increasing portability[@Ferris2003]. In the geospatial domain, web service standards developed by the Open Geospatial Consortium have achieved considerable impact [@Percivall2010]. Since services such as WMS (Web Map Service) and WCS (Web Coverage Service) are used worldwide with success, it is natural to ask: how to design web services for big Earth observation (EO) data? The current work addresses this issue and proposes WTSS (Web Time Series Service), a new services for big EO data.

Satellite Image Time Series

Given a series of remote sensing snapshots, we can reorganise them into a set of time series. A satellite image time series is obtained by taking measurements in the same pixel location $(x,y)$ in consecutive times $t_1,...,t_m$, as shown in the figure below.

knitr::include_graphics(system.file("extdata/markdown/figures", "time_series.png", package = "wtss"))

Recent results in the literature show that analysis of satellite image times series enables extracting long-term trends of land change \citep[@Pasquarella2016]. Such information which would be harder to obtain by processing 2D images separately. These analysis are supported by algorithms such as TWDTW [@Maus2016], TIMESTAT [@Jonsson2004] and BFAST [@Verbesselt2010]. These algorithms process individual time-series and combine the results for selected periods to generate classified maps.

For example, classification methods such as TWDTW [@Maus2019] can then break an image time series into a set of intervals. As an example, the figure below shows four events extracted from a remote sensing time series, expressed in terms of the its intervals. From 2000 to 2001 the area was a forest that was deforested in 2002. From 2003 to 2005 the area it was used for pasture and from 2005 to 2008, it was transformed into a cropland. This kind of classification is done by algorithms such that split a time series into a set of events. Combining snapshots with time series, scientists can explore the full depth of big remote sensing data archives.

Web Time Series Service (WTSS)

Motivated by the need to retrieve satellite image time series from large 3D arrays, we have designed and implemented the Web Time Series Service (WTSS). A WTSS server takes as input a 3D array. Each array has a spatial and a temporal dimension and can be multidimensional in terms of its attributes. The WTSS service is independent of the actual data architecture used for 3D array store. It can work with solutions such as flat files, MapReduce distributed datasets, array databases or object-relational databases.

The WTSS API has three commands: (a) list_coverages that returns a list of coverages available in the server; (b) describe_coverage that returns the metadata for a given coverage; (c) time_series that returns a time series for a spatio-temporal location.

Connecting to a WTSS server

The first step towards using the service is connecting to a server that supports the WTSS protocol. Currenlty, Brazil's National Insitute for Space Research (INPE) runs such a service. In the package, the connection is enabled by using the URL of the service. The package informs if the connection has been correctly made.

# Connect to the WTSS server at INPE Brazil
wtss_inpe <-  wtss::WTSS("http://www.esensing.dpi.inpe.br/wtss/")

Listing coverages available at the WTSS server

This operation allows clients to retrieve the capabilities provided by any server that implements WTSS. It returns a list of coverage names available in a server instance.

# Connect to the WTSS server at INPE Brazil

Describing a coverage from the WTSS server

This operation returns the metadata for a given coverage identified by its name. It includes its range in the spatial and temporal dimensions.

# Connect to the WTSS server at INPE Brazil
wtss::describe_coverage(wtss_inpe, name = "MOD13Q1")

The coverage description is saved as a tibble in the wtss object, to be used whenever required.

# Coverage description available in the wtss object

Obtaining a time series

This operation requests the time series of values of a coverage attribute at a given location. Its parameters are: (a) wtss.obj: either a WTSS object (created by the operation wtss::WTSS as shown above) or a valid WTSS server URL; (b) name: Coverage name; (c) attributes: vector of band names (optional). If omitted, all bands are retrieved; (d) longitude: longitude in WGS84 coordinate system; (e)latitude: Latitude in WGS84 coordinate system; (f)start_date (optional): Start date in the format yyyy-mm-dd or yyyy-mm depending on the coverage. If omitted, the first date on the timeline is used; (g) end_date(optional): End date in the format yyyy-mm-dd or yyyy-mm depending on the coverage. If omitted, the last date of the timeline is used.

# Request a time series from the "MOD13Q1" coverage
ts   <- wtss::time_series(wtss_inpe, name = "MOD13Q1", 
        attributes = c("ndvi","evi"), longitude = -45.00, latitude  = -12.00,
        start_date = "2000-02-18", end_date = "2016-12-18")

The result of the operation is a tibble, which is a generalization of a data.frame, the usual way in R to organise data in tables. Tibbles are part of the tidyverse, a collection of R packages designed to work together in data manipulation [@Wickham2017]. The tibble contains data and metadata. The first six columns contain the metadata: satellite, sensor, spatial and temporal information, and the coverage from where the data has been extracted. The spatial location is given in longitude and latitude coordinates for the "WGS84" ellipsoid. The time_series column contains the time series data for each spatiotemporal location. This data is also organized as a tibble, with a column with the dates and the other columns with the values for each spectral band.

# Showing the contents of a time series

Plotting the time series

For convenience, the WTSS package provides a convenience funtion for plotting the time series.

# Plotting the contents of a time series

Conversion to "zoo" and "ts" formats

Since many time series analysis functions in R require data to be made available in the "zoo" and "ts" formats, the wtss package provides two convenience functions: wtss_to_zoo and "wtss_to_ts. The example below shows the detection of trends in a series converted to the "ts" format using the BFAST package [@Verbesselt2010].

# create a connection using a serverUrl
server <-  wtss::WTSS("http://www.esensing.dpi.inpe.br/wtss/")

# get a time series for the "ndvi" attribute
ndvi_wtss <- wtss::time_series(server, "MOD13Q1", attributes = c("ndvi"), 
                         latitude = -10.408, longitude = -53.495, 
                         start = "2000-02-18", end = "2016-01-01")

# convert to ts
ndvi_ts <- wtss::wtss_to_ts(ndvi_wtss, band = "ndvi")

# detect trends
bf <- bfast::bfast01(ndvi_ts)
# plot the result

Try the wtss package in your browser

Any scripts or data that you put into this service are public.

wtss documentation built on Jan. 11, 2020, 9:27 a.m.