Using 'Copernicus Data Space Ecosystem' API Wrapper

knitr::opts_chunk$set(
    fig.width = 7,
    fig.height = 4,
    out.width = "100%",
    fig.align = "center",
    collapse = TRUE,
    comment = "#>"
)
library(CDSE)
options(warn = -1)

\newpage

Introduction

The CDSE package for R was developed to allow access to the 'Copernicus Data Space Ecosystem' data and services from R. The 'Copernicus Data Space Ecosystem', deployed in 2023, offers access to the EO data collection from the Copernicus missions, with discovery and download capabilities and numerous data processing tools. In particular, the 'Sentinel Hub' API provides access to the multi-spectral and multi-temporal big data satellite imagery service, capable of fully automated, real-time processing and distribution of remote sensing data and related EO products. Users can use APIs to retrieve satellite data over their AOI and specific time range from full archives in a matter of seconds. When working on the application of EO where the area of interest is relatively small compared to the image tiles distributed by Copernicus (100 x 100 km), it allows to retrieve just the portion of the image of interest rather than downloading the huge tile image file and processing it locally. The goal of the CDSE package is to provide easy access to this functionality from R.

The main functions allow to search the catalog of available imagery from the Sentinel-1, Sentinel-2, Sentinel-3, and Sentinel-5 missions, and to process and download the images of an area of interest and a time range in various formats. Other functions might be added in subsequent releases of the package.

API authentication

Most of the API functions require OAuth2 authentication. The recommended procedure is to obtain an authentication client object from the GetOAuthClient function and pass it as the client argument to the functions requiring the authentication. For more detailed information, you are invited to consult the "Before you start" document.

id <- Sys.getenv("CDSE_ID")
secret <- Sys.getenv("CDSE_SECRET")
OAuthClient <- GetOAuthClient(id = id, secret = secret)

Note

In this document, the data.frames are output as tibbles since it renders better in PDF. However, all the functions produce standard data.frames.

Collections

We can get the list of all the imagery collections available in the 'Copernicus Data Space Ecosystem'. By default, the list is formatted as a data.frame listing the main collection features. It is also possible to obtain the raw list with all information by setting the argument as_data_frame to FALSE.

collections <- GetCollections(as_data_frame = TRUE)
collections

\newpage

Catalog search

The imagery catalog can be searched by spatial and temporal extent for every collection present in the 'Copernicus Data Space Ecosystem'. For the spatial filter, you can provide either a sf or sfc object from the sf package, typically a (multi)polygon, describing the Area of Interest, or a numeric vector of four elements describing the bounding box of interest. For the temporal filter, you must specify the time range by either Date or character values that can be converted to date by as.Date function. Open intervals (one side only) can be obtained by providing the NA or NULL value for the corresponding argument.

dsn <- system.file("extdata", "luxembourg.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
images <- SearchCatalog(aoi = aoi, from = "2023-07-01", to = "2023-07-31", 
    collection = "sentinel-2-l2a", with_geometry = TRUE, client = OAuthClient)
images

We can visualize the coverage of the area of interest by the satellite image tiles by plotting the footprints of the available images and showing the region of interest in red.

knitr::opts_knit$set(global.device = FALSE)
library(maps)
days <- range(as.Date(images$acquisitionDate))
maps::map(database = "world", col = "lightgrey", fill = TRUE, mar = c(0, 0, 4, 0),
    xlim = c(3, 9), ylim = c(47.5, 51.5))
plot(sf::st_geometry(aoi), add = TRUE, col = "red", border = FALSE)
plot(sf::st_geometry(images), add = TRUE)
title(main = sprintf("AOI coverage by image tiles for period %s", 
    paste(days, collapse = " / ")), line = 1L, cex.main = 0.75)

Some tiles cover only a small fraction of the area of interest, while others cover almost the entire area.

summary(images$areaCoverage)

The tile number can be obtained from the image attribute sourceId, as explained here. We can therefore summarize the distribution of area coverage by tile number, and see which tiles provide the best coverage of the AOI.

tileNumber <- substring(images$sourceId, 39, 44)
by(images$areaCoverage, INDICES = tileNumber, FUN = summary)

Catalog by season

Sometimes one can be interested in only a given period of each year, for example, the images taken during the summer months (June to August). We can filter an existing image catalog a posteriori using the SeasonalFilter function.

dsn <- system.file("extdata", "centralpark.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
images <- SearchCatalog(aoi = aoi, from = "2021-01-01", to = "2023-12-31", 
    collection = "sentinel-2-l2a", with_geometry = FALSE, filter = "eo:cloud_cover < 5", 
    client = OAuthClient)
dim(images)
summer_images <- SeasonalFilter(images, from = "2021-06-01", to = "2023-08-31")
dim(summer_images)

It is also possible to query the API directly on the desired seasonal periods by using a vectorized version of the SearchCatalog function. The vectorized versions allow running a series of queries having the same parameter values except for either time range, AOI, or the bounding box parameters, using lapply or similar function, and thus potentially also using the parallel processing.

dsn <- system.file("extdata", "centralpark.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
seasons <- SeasonalTimerange(from = "2021-06-01", to = "2023-08-31")
lst_summer_images <- lapply(seasons, SearchCatalogByTimerange, aoi = aoi, 
    collection = "sentinel-2-l2a", filter = "eo:cloud_cover < 5", with_geometry = FALSE, 
    client = OAuthClient)
summer_images <- do.call(rbind, lst_summer_images)
dim(summer_images)
summer_images <- summer_images[rev(order(summer_images$acquisitionDate)), ]
row.names(summer_images) <- NULL
summer_images

\newpage

Scripts

As we shall see in the examples below, we have to provide a script argument to the GetArchiveImage function.

An evalscript (or "custom script") is a piece of JavaScript code that defines how the satellite data shall be processed by the API and what values the service shall return. It is a required part of any request involving data processing, such as retrieving an image of the area of interest.

The evaluation scripts can use any JavaScript function or language structures, along with certain utility functions provided by the API for user convenience. Chrome V8 JavaScript engine is used for running the evalscripts.

The evaluation scripts are passed as the script argument to the GetArchiveImage function. It has to be either a character string containing the evaluation script or the name of the file containing the script. The scripts folder of this package contains a few examples of evaluation scripts.

It is beyond the scope of this document to provide guidance for writing scripts, we encourage users to consult the API Beginners Guide and Evalscript (custom script) documentation. You can find a big collection of custom scripts that you can readily use in this repository.

\newpage

Retrieving images

Retrieving AOI satellite image as a raster object

One of the most important features of the API is its ability to extract only the part of the images covering the area of interest. If the AOI is small as in the example below, this is a significant gain in efficiency (download, local processing) compared to getting the whole tile image and processing it locally.

dsn <- system.file("extdata", "centralpark.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
images <- SearchCatalog(aoi = aoi, from = "2021-05-01", to = "2021-05-31", 
    collection = "sentinel-2-l2a", with_geometry = TRUE, client = OAuthClient)
images
summary(images$areaCoverage)

As the area is small, it is systematically fully covered by all available images. We shall select the date with the least cloud cover, and retrieve the NDVI values as a SpatRaster from package terra. This allows further processing of the data, as shown below by replacing all negative values with zero. The size of the pixels is specified directly by the resolution argument. We are also adding a 100-meter buffer around the area of interest and masking the pixels outside of the AOI.

day <- images[order(images$tileCloudCover), ]$acquisitionDate[1]
script_file <- system.file("scripts", "NDVI_float32.js", package = "CDSE")
ras <- GetImage(aoi = aoi, time_range = day, script = script_file, 
    collection = "sentinel-2-l2a", format = "image/tiff", mosaicking_order = "leastCC", 
    resolution = 10, mask = TRUE, buffer = 100, client = OAuthClient)
ras
ras[ras < 0] <- 0
terra::plot(ras, main = paste("Central Park NDVI on", day), cex.main = 0.75,
    col = colorRampPalette(c("darkred", "yellow", "darkgreen"))(99))

Retrieving AOI satellite image as an image file

If we don't want to process the satellite image locally but simply use it as an image file (to include in a report or a Web page, for example), we can use the appropriate script that will render a three-band raster for RGB layers (or one for black-and-white image). Here we specify the area of interest by its bounding box instead of the exact geometry. We also demonstrate that the evaluation script can be passed as a single character string, and provide the number of pixels in the output image rather than the size of individual pixels - it makes more sense if the image is intended for display and not processing.

bbox <- as.numeric(sf::st_bbox(aoi))
script_text <- paste(readLines(system.file("scripts", "TrueColorS2L2A.js", 
    package = "CDSE")), collapse = "\n")
cat(c(readLines(system.file("scripts", "TrueColorS2L2A.js", package = "CDSE"), n = 15), 
    "..."), sep = "\n")
png <- tempfile("img", fileext = ".png")
GetImage(bbox = bbox, time_range = day, script = script_text, 
    collection = "sentinel-2-l2a", file = png, format = "image/png", 
    mosaicking_order = "leastCC", pixels = c(600, 950), client = OAuthClient)
terra::plotRGB(terra::rast(png))

Retrieving a series of images in a batch

It often happens that one is interested in acquiring a series of images of a particular zone (AOI or bounding box) for several dates, or the images of different areas of interest for the same date (probably located close to each other so that they are visited on the same day). The GetImageBy* functions (GetImageByTimerange, GetImageByAOI, GetImageByBbox) facilitate this task as they are specifically crafted for being called from a lapply-like function, and thus potentially be executed in parallel. We shall illustrate how to do this with an example.

dsn <- system.file("extdata", "centralpark.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
images <- SearchCatalog(aoi = aoi, from = "2023-01-01", to = "2023-12-31",
                        collection = "sentinel-2-l2a", with_geometry = TRUE, 
                        filter = "eo:cloud_cover < 5", client = OAuthClient)
# Get the day with the minimal cloud cover for every month -----------------------------
tmp1 <- images[, c("tileCloudCover", "acquisitionDate")]
tmp1$month <- lubridate::month(images$acquisitionDate)
agg1 <- stats::aggregate(tileCloudCover ~ month, data = tmp1, FUN = min)
tmp2 <- merge.data.frame(agg1, tmp1, by = c("month", "tileCloudCover"), sort = FALSE)
# in case of ties, get an arbitrary date (here the smallest acquisitionDate, 
# could also be the biggest)
agg2 <- stats::aggregate(acquisitionDate ~ month, data = tmp2, FUN = min)
monthly <- merge.data.frame(agg2, tmp2, by = c("acquisitionDate", "month"), sort = FALSE)
days <- monthly$acquisitionDate
# Retrieve images in parallel ----------------------------------------------------------
script_file <- system.file("scripts", "NDVI_float32.js", package = "CDSE")
tmp_folder <- tempfile("dir")
if (!dir.exists(tmp_folder)) dir.create(tmp_folder)
cl <- parallel::makeCluster(4)
ans <- parallel::clusterExport(cl, list("tmp_folder"), envir = environment())
ans <- parallel::clusterEvalQ(cl, {library(CDSE)})
lstRast <- parallel::parLapply(cl, days, fun = function(x, ...) {
    GetImageByTimerange(x, file = sprintf("%s/img_%s.tiff", tmp_folder, x), ...)},
    aoi = aoi, collection = "sentinel-2-l2a", script = script_file,
    format = "image/tiff", mosaicking_order = "mostRecent", resolution = 10,
    buffer = 0, mask = TRUE, client = OAuthClient)
parallel::stopCluster(cl)
# Plot the images ----------------------------------------------------------------------
par(mfrow = c(3, 4))
ans <- sapply(seq_along(days), FUN = function(i) {
    ras <- terra::rast(lstRast[[i]])
    day <- days[i]
    ras[ras < 0] <- 0
    terra::plot(ras, main = paste("Central Park NDVI on", day), range = c(0, 1),
        cex.main = 0.7, pax = list(cex.axis = 0.5), plg = list(cex = 0.5),
        col = colorRampPalette(c("darkred", "yellow", "darkgreen"))(99))
    })

In this particular example, parallelization is not necessarily beneficial as we are retrieving only 12 images, but for a large number of images, it can significantly reduce the execution time. Also note that we have used the filter argument of the SearchCatalog function to limit the list of images to the tiles having cloud cover < 5%.

\newpage

Retrieving statistics

If you are only interested in calculating the average value (or some other statistic) of some index or just the raw band values, the Statistical API enables you to get statistics calculated based on satellite imagery without having to download images. You need to specify your area of interest, time period, evalscript, and which statistical measures should be calculated. The requested statistics are returned as a data.frame or as a list.

Statistical evalscripts

All general rules for building evalscripts apply. However, there are some specifics when using evalscripts with the Statistical API:

Retrieving simple statistics

Besides the time range, you have to specify the way you want the values to be aggregated over time. For this you use the aggregation_period and aggregation_unit arguments. The aggregation_unit must be one of day, week, month or year, and the aggregation_period providing the number of aggregation_units (days, weeks, ...) over which the statistics are calculated. The default values are "1" and "day", producing the daily statistics. If the last interval in the given time range isn't divisible by the provided aggregation interval, you can skip the last interval (default behavior), shorten the last interval so that it ends at the end of the provided time range, or extend the last interval over the end of the provided time range so that all intervals are of equal duration. This is controlled by the value of the lastIntervalBehavior argument.

dsn <- system.file("extdata", "centralpark.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
script_file <- system.file("scripts", "NDVI_CLOUDS_STAT.js", package = "CDSE")
daily_stats <- GetStatistics(aoi = aoi, time_range = c("2023-07-01", "2023-07-31"),
    collection = "sentinel-2-l2a", script = script_file, mosaicking_order = "leastCC",
    resolution = 100, aggregation_period = 1, client = OAuthClient)
weekly_stats <- GetStatistics(aoi = aoi, time_range = c("2023-07-01", "2023-07-31"),
    collection = "sentinel-2-l2a", script = script_file,mosaicking_order = "leastCC",
    resolution = 100, aggregation_period = 7, client = OAuthClient)
weekly_stats_extended <- GetStatistics(aoi = aoi, 
    time_range = c("2023-07-01", "2023-07-31"), collection = "sentinel-2-l2a", 
    script = script_file, mosaicking_order = "leastCC", resolution = 100, 
    aggregation_period = 1, aggregation_unit = "w", lastIntervalBehavior = "EXTEND", 
    client = OAuthClient)
daily_stats
weekly_stats
weekly_stats_extended

In this example we have demonstrated that a week can be specified as either 7 days or 1 week.

Retrieving statistics with percentiles

Besides the basic statistics (min, max, mean, stDev), one can also request to compute the percentiles. If the percentiles requested are 25, 50, and 75, the corresponding output is renamed 'q1', 'median', and 'q3'.

daily_stats <- GetStatistics(aoi = aoi, time_range = c("2023-07-01", "2023-07-31"),
    collection = "sentinel-2-l2a", script = script_file, mosaicking_order = "leastCC",
    resolution = 100, aggregation_period = 1, percentiles = c(25, 50, 75), 
    client = OAuthClient)
head(daily_stats, n = 10)
weekly_stats <- GetStatistics(aoi = aoi, time_range = c("2023-07-01", "2023-07-31"),
    collection = "sentinel-2-l2a", script = script_file,mosaicking_order = "leastCC",
    resolution = 100, aggregation_period = 7, percentiles = seq(10, 90, by = 10), 
    client = OAuthClient)
head(weekly_stats, n = 10)

Retrieving a series of statistics in a batch

Just as when retrieving satellite images, one can be interested in acquiring a series of statistics for a particular zone (AOI or bounding box) for several dates, or the statistics of different zones for the same periods. The GetStatisticsBy* functions (GetStatisticsByTimerange, GetStatisticsByAOI, GetStatisticsByBbox) facilitate this task as they are specifically crafted for being called from a lapply-like function, and thus potentially be executed in parallel. The following example illustrates how to do this.

dsn <- system.file("extdata", "centralpark.geojson", package = "CDSE")
aoi <- sf::read_sf(dsn, as_tibble = FALSE)
script_file <- system.file("scripts", "NDVI_dataMask_float32.js", package = "CDSE")
seasons <- SeasonalTimerange(from = "2020-06-01", to = "2023-08-31")
lst_stats <- lapply(seasons, GetStatisticsByTimerange, aoi = aoi, 
    collection = "sentinel-2-l2a", script = script_file, mosaicking_order = "leastCC", 
    resolution = 100, aggregation_period = 7L, client = OAuthClient)
weekly_stats <- do.call(rbind, lst_stats)
weekly_stats <- weekly_stats[order(weekly_stats$from), ]
row.names(weekly_stats) <- NULL
head(weekly_stats, n = 5)

Copernicus Data Space Ecosystem services status

If you encounter any connection issues while using this package, please check your internet connection first. If your internet connection is working fine, you can also check the status of the Copernicus Data Space Ecosystem services by visiting this webpage. It provides a quasi real-time status of the various services provided. Once you are on the webpage, scroll down to Sentinel Hub, and pay particular attention to the Process API (used for retrieving images), Catalog API (used for catalog searches), and Statistical API (used for retrieving statistics).

Copernicus Data Space Ecosystem Service Health



Try the CDSE package in your browser

Any scripts or data that you put into this service are public.

CDSE documentation built on Sept. 11, 2024, 8:43 p.m.