knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = TRUE,
  fig.width = 7, 
  fig.height = 7
)

Loading libraries

library(AIgeomorphologist)
library(sen2r)

Getting data

The labelled points data are stored in SSCT_data. The function to_spatial() transform the data.frame into sp object.

labelled_points <- SSCT_data %>% to_spatial() %>% sf::st_as_sf()
one_random_point <- labelled_points %>% dplyr::sample_n(1)
one_random_point

Listing S2 tiles

First, we retrieve the list of the tile id for the S2 satellites over the spatial_extent of interest.

id_tiles <- s2_list(
  spatial_extent = labelled_points, 
  time_period = "full",
  level = "L2A",
  availability = "online"
  ) %>% 
  as.data.frame() %>% 
  dplyr::pull(id_tile) %>%
  unique()
saveRDS(id_tiles, file.path(out_dir, "id_tiles.Rds"))
id_tiles

Listing S2 products

Second, we retrieve the entire list of the sensed tiles for the period of interest. This command can take a few hours to run.

search_results <- s2_list(
  tile = id_tiles, 
  time_interval = c(as.Date("2019-06-01"), as.Date("2020-10-01")),
  time_period = "full",
  level = "L2A",
  availability = "online"
  ) 
saveRDS(search_results, file.path(out_dir, "search_results.Rds"))
search_results <- search_results %>% as.data.frame()
dim(search_results)
colnames(search_results)
head(search_results)

Determining optimal cloud cover

We can now analyze these search results so that we only download the tiles we need. One paramount parameter is the amount of clouds identified in the imagery. To define the threshold we defined min_SAFE as the minimum number of sensed tiles across all tiles for a given cloud threshold and sum_SAFE as the total of sensed tiles for a given cloud threshold.

search_stats <- search_results %>% dplyr::group_by(id_tile) %>%  dplyr::summarise(dplyr::across(dplyr::contains("clouds"), list(min = min, mean = mean))) %>% dplyr::summarise(clouds_min = min(clouds_min), clouds_mean = mean(clouds_mean))

clouds_stats <- data.frame(
    clouds_threshold = seq(ceiling(search_stats$clouds_min), ceiling(search_stats$clouds_mean), by = 0.1)
  ) %>% 
  dplyr::mutate(
    min_SAFE = sapply(clouds_threshold, function(clouds_threshold){
      search_results %>% 
        dplyr::group_by(id_tile) %>% 
        dplyr::group_map(~ {dplyr::filter(., clouds <= clouds_threshold) %>% nrow()}) %>%
        unlist() %>%
        min()
      }),
    sum_SAFE = sapply(clouds_threshold, function(clouds_threshold){
      search_results %>% 
        dplyr::group_by(id_tile) %>% 
        dplyr::group_map(~ {dplyr::filter(., clouds <= clouds_threshold) %>% nrow()}) %>%
        unlist() %>%
        sum()
      })
  )

The graph below shows the trade-off between incorporating more cloud: the minimum number of sensed tiles goes up, but so does the total number of sensed tiles. Each tile is roughly 1 GB.

p <- ggplot2::ggplot(clouds_stats %>% reshape2::melt(id.vars = c("clouds_threshold")), ggplot2::aes(x = clouds_threshold, y = value)) +
  ggplot2::geom_point() +
  ggplot2::facet_wrap(. ~ variable, scales = "free") +
  ggpubr::theme_pubclean()
plotly::ggplotly(p)

From these two graphs, we select the following threshold:

clouds_stats %>% dplyr::filter(min_SAFE >= 20) %>% head(1)
clouds_threshold <- clouds_stats %>% dplyr::filter(min_SAFE >= 20) %>% head(1) %>% dplyr::pull(clouds_threshold)

which is appropriate as the cloud percentage remains low, at least 20 tiles are sensed for any tile, and less than 3 TB of SAFE data have to be downloaded.

The selection of the product for downloading is then done by:

selected_search_results <- search_results %>% dplyr::filter(clouds <= clouds_threshold)
dim(selected_search_results) 

Sensing dates

Additionally, we can visualize the sensing dates with the following codes:

p <- ggplot2::ggplot(selected_search_results, ggplot2::aes(x = sensing_datetime, y = clouds, color = id_tile)) +
ggplot2::geom_point() +
ggpubr::theme_pubclean()
plotly::ggplotly(p)

Visual inspection of the above graph tile by tile does not indicate that there is a significant skew towards a given period for any tile. Example of a tile with discontinuous record is 10TGK. The tile with the lowest average cloud cover is r selected_search_results %>% dplyr::group_by(id_tile) %>% dplyr::summarise(mean = mean(clouds)) %>% dplyr::filter(mean == min(mean)) %>% dplyr::pull(id_tile).

Number of sensed tiles

The following graph shows the distribution of the number of sensed tiles:

count_df <- data.frame(count = selected_search_results %>%
  dplyr::group_by(id_tile) %>% 
  dplyr::group_map(~ nrow(.)) %>% 
  unlist()
  )
ggplot2::ggplot(count_df, ggplot2::aes(x = count)) +
ggplot2::geom_density() +
ggplot2::labs(x = "number of sensed tiles") +
ggpubr::theme_pubclean()

Downloading products

Running s2_download() then allows to retrieve all the tile imagery provided that a safelist is passed as s2_prodlist.

class(search_results)
selected_search_results <- selected_search_results %>% as("safelist")
class(selected_search_results)
s2_download(s2_prodlist = selected_search_results, outdir = "path/to/your/outdir")

Appendix

Description of Sentinel-2 MSI products

  1. Level-0 is compressed raw data. The Level-0 product contains all the information required to generate the Level-1 (and upper) product levels.
  2. Level-1
  3. Level-1A is uncompressed raw data with spectral bands coarsely coregistered and ancillary data appended.
  4. Level-1B data is radiometrically corrected radiance data. The physical geometric model is refined using available ground control points and appended to the product, but not applied.
  5. Level-1C product provides orthorectified Top-Of-Atmosphere (TOA) reflectance, with sub-pixel multispectral registration. Cloud and land/water masks are included in the product.
  6. Level-2A product provides orthorectified Bottom-Of-Atmosphere (BOA) reflectance, with sub-pixel multispectral registration. A Scene Classification map (cloud, cloud shadows, vegetation, soils/deserts, water, snow, etc.) is included in the product.

from S2 MSI technical guide

SAFE format

The SAFE format has been designed to act as a common format for archiving and conveying data within ESA Earth Observation archiving facilities. The SAFE format wraps a folder containing image data in a binary data format and product metadata in XML. This flexibility allows the format to be scalable enough to represent all levels of SENTINEL products.

A SENTINEL-2 product refers to a directory folder that contains a collection of information (Figure 1). It includes:

  • a manifest.safe file which holds the general product information in XML
  • a preview image in JPEG2000 format
  • subfolders for measurement datasets including image data (granules/tiles) in GML-JPEG2000 format
  • subfolders for datastrip level information
  • a subfolder with auxiliary data (e.g. International Earth Rotation & Reference Systems (IERS) bulletin)
  • HTML previews

from S2 MSI data formats



hrvg/AIgeomorphologist documentation built on Dec. 20, 2021, 4:49 p.m.