sits_cube: Create data cubes from image collections

View source: R/sits_cube.R

sits_cubeR Documentation

Create data cubes from image collections

Description

Creates a data cube based on spatial and temporal restrictions in collections available in cloud services or local repositories. The following cloud providers are supported, based on the STAC protocol: Amazon Web Services (AWS), Brazil Data Cube (BDC), Copernicus Data Space Ecosystem (CDSE), Digital Earth Africa (DEAFRICA), Digital Earth Australia (DEAUSTRALIA), Microsoft Planetary Computer (MPC), Nasa Harmonized Landsat/Sentinel (HLS), Swiss Data Cube (SDC), TERRASCOPE or USGS Landsat (USGS). Data cubes can also be created using local files.

Usage

sits_cube(source, collection, ...)

## S3 method for class 'sar_cube'
sits_cube(
  source,
  collection,
  ...,
  orbit = "ascending",
  bands = NULL,
  tiles = NULL,
  roi = NULL,
  crs = NULL,
  start_date = NULL,
  end_date = NULL,
  platform = NULL,
  multicores = 2,
  progress = TRUE
)

## S3 method for class 'stac_cube'
sits_cube(
  source,
  collection,
  ...,
  bands = NULL,
  tiles = NULL,
  roi = NULL,
  crs = NULL,
  start_date = NULL,
  end_date = NULL,
  platform = NULL,
  multicores = 2,
  progress = TRUE
)

## S3 method for class 'local_cube'
sits_cube(
  source,
  collection,
  ...,
  data_dir,
  vector_dir = NULL,
  tiles = NULL,
  bands = NULL,
  vector_band = NULL,
  start_date = NULL,
  end_date = NULL,
  labels = NULL,
  parse_info = NULL,
  version = "v1",
  delim = "_",
  multicores = 2L,
  progress = TRUE
)

Arguments

source

Data source (one of "AWS", "BDC", "DEAFRICA", "MPC", "SDC", "USGS" - character vector of length 1).

collection

Image collection in data source (character vector of length 1). To find out the supported collections, use sits_list_collections()).

...

Other parameters to be passed for specific types.

orbit

Orbit name ("ascending", "descending") for SAR cubes.

bands

Spectral bands and indices to be included in the cube (optional - character vector). Use sits_list_collections() to find out the bands available for each collection.

tiles

Tiles from the collection to be included in the cube (see details below) (character vector of length 1).

roi

Region of interest (either an sf object, shapefile, SpatExtent, or a numeric vector with named XY values ("xmin", "xmax", "ymin", "ymax") or named lat/long values ("lon_min", "lat_min", "lon_max", "lat_max").

crs

The Coordinate Reference System (CRS) of the roi. It must be specified when roi is named XY values ("xmin", "xmax", "ymin", "ymax") or SpatExtent

start_date, end_date

Initial and final dates to include images from the collection in the cube (optional). (Date in YYYY-MM-DD format).

platform

Optional parameter specifying the platform in case of collections that include more than one satellite (character vector of length 1).

multicores

Number of workers for parallel processing (integer, min = 1, max = 2048).

progress

Logical: show a progress bar?

data_dir

Local directory where images are stored (for local cubes - character vector of length 1).

vector_dir

Local director where vector files are stored (for local vector cubes - character vector of length 1).

vector_band

Band for vector cube ("segments", "probs", "class")

labels

Labels associated to the classes (Named character vector for cubes of classes "probs_cube" or "class_cube").

parse_info

Parsing information for local files (for local cubes - character vector).

version

Version of the classified and/or labelled files. (for local cubes - character vector of length 1).

delim

Delimiter for parsing local files (single character)

Value

A tibble describing the contents of a data cube.

Note

To create cubes from cloud providers, users need to inform:

  1. source: One of "AWS", "BDC", "CDSE", "DEAFRICA", "DEAUSTRALIA", "HLS", "MPC", "SDC", "TERRASCOPE", or "USGS";

  2. collection: Collection available in the cloud provider. Use sits_list_collections() to see which collections are supported;

  3. tiles: A set of tiles defined according to the collection tiling grid;

  4. roi: Region of interest. Either a shapefile, a named vector ("lon_min", "lat_min", "lon_max", "lat_max") in WGS84, a sfc or sf object from sf package in WGS84 projection. A named vector ("xmin", "xmax", "ymin", "ymax") or a SpatExtent can also be used, requiring only the specification of the crs parameter.

The parameter bands, start_date, and end_date are optional for cubes created from cloud providers.

Either tiles or roi must be informed. The roi parameter is used to select images. This parameter does not crop a region; it only selects images that intersect it.

If you want to use GeoJSON geometries (RFC 7946) as value roi, you can convert it to sf object and then use it.

sits can access data from multiple providers, including Amazon Web Services (AWS), Microsoft Planetary Computer (MPC), Brazil Data Cube (BDC), Copernicus Data Space Ecosystem (CDSE), Digital Earth Africa, Digital Earth Australia, NASA EarthData, Terrascope and more.

In each provider, sits can access multiple collections. For example, in MPC sits can access multiple open data collections, including "SENTINEL-2-L2A" for Sentinel-2/2A images, and "LANDSAT-C2-L2" for the Landsat-4/5/7/8/9 collection.

In AWS, there are two types of collections: open data and requester-pays. Currently, sits supports collections "SENTINEL-2-L2A", "SENTINEL-S2-L2A-COGS" (open data) and "LANDSAT-C2-L2" (requester-pays). There is no need to provide AWS credentials to access open data collections. For requester-pays data, you need to provide your AWS access codes as environment variables, as follows: Sys.setenv( AWS_ACCESS_KEY_ID = <your_access_key>, AWS_SECRET_ACCESS_KEY = <your_secret_access_key> )

In BDC, there are many collections, including "LANDSAT-OLI-16D" (Landsat-8 OLI, 30 m resolution, 16-day intervals), "SENTINEL-2-16D" (Sentinel-2A and 2B MSI images at 10 m resolution, 16-day intervals), "CBERS-WFI-16D" (CBERS 4 WFI, 64 m resolution, 16-day intervals), and others. All BDC collections are regularized.

To explore providers and collections sits supports, use the sits_list_collections() function.

If you want to learn more details about each provider and collection available in sits, please read the online sits book (e-sensing.github.io/sitsbook). The chapter Earth Observation data cubes provides a detailed description of all collections you can use with sits (e-sensing.github.io/sitsbook/earth-observation-data-cubes.html).

To create a cube from local files, you need to inform:

  1. source: The data provider from which the data was downloaded (e.g, "BDC", "MPC");

  2. collection: The collection from which the data comes from. (e.g., "SENTINEL-2-L2A" for the Sentinel-2 MPC collection level 2A);

  3. data_dir: The local directory where the image files are stored.

  4. parse_info: Defines how to extract metadata from file names by specifying the order and meaning of each part, separated by the "delim" character. Default value is c("X1", "X2", "tile", "band", "date").

  5. delim: The delimiter character used to separate components in the file names. Default is "_".

Note that if you are working with local data cubes created by sits, you do not need to specify parse_info and delim. These elements are automatically identified. This is particularly useful when you have downloaded or created data cubes using sits.

For example, if you downloaded a data cube from the Microsoft Planetary Computer (MPC) using the function sits_cube_copy(), you do not need to provide parse_info and delim.

If you are using a data cube from a source supported by sits (e.g., AWS, MPC) but downloaded / managed with an external tool, you will need to specify the parse_info and delim parameters manually. For this case, you first need to ensure that the local files meet some critical requirements:

  • All image files must have the same spatial resolution and projection;

  • Each file should represent a single image band for a single date;

  • File names must include information about the "tile", "date", and "band" in the file.

For example, if you are creating a Sentinel-2 data cube on your local machine, and the files have the same spatial resolution and projection, with each file containing a single band and date, an acceptable file name could be:

  • "SENTINEL-2_MSI_20LKP_B02_2018-07-18.jp2"

This file name works because it encodes the three key pieces of information used by sits:

  • Tile: "20LKP";

  • Band: "B02";

  • Date: "2018-07-18"

Other example of supported file names are:

  • "CBERS-4_WFI_022024_B13_2021-05-15.tif";

  • "SENTINEL-1_GRD_30TXL_VV_2023-03-10.tif";

  • "LANDSAT-8_OLI_198030_B04_2020-09-12.tif".

The parse_info parameter tells sits how to extract essential metadata from file names. It defines the sequence of components in the file name, assigning each part a label such as "tile", "band", and "date". For parts of the file name that are irrelevant to sits, you can use dummy labels like "X1", "X2", and so on.

For example, consider the file name:

  • "SENTINEL-2_MSI_20LKP_B02_2018-07-18.jp2"

With parse_info = c("X1", "X2", "tile", "band", "date") and delim = "_", the extracted metadata would be:

  • X1: "SENTINEL-2" (ignored)

  • X2: "MSI" (ignored)

  • tile: "20LKP" (used)

  • band: "B02" (used)

  • date: "2018-07-18" (used)

The delim parameter specifies the character that separates components in the file name. The default delimiter is "_".

Note that when you load a local data cube specifying the source (e.g., AWS, MPC) and collection, sits assumes that the data properties (e.g., scale factor, minimum, and maximum values) match those defined for the selected provider. However, if you are working with custom data from an unsupported source or data that does not follow the standard definitions of providers in sits, refer to the Technical Annex of the sits online book for guidance on handling such cases (e-sensing.github.io/sitsbook/technical-annex.html).

It is also possible to create result cubes from local files produced by classification or post-classification algorithms. In this case, the parse_info is specified differently, and other additional parameters are required:

  • band: Band name associated to the type of result. Use "probs", for probability cubes produced by sits_classify(); "bayes", for smoothed cubes produced by sits_smooth(); "segments", for vector cubes produced by sits_segment(); "entropy" when using sits_uncertainty(), and "class" for cubes produced by sits_label_classification();

  • labels: Labels associated to the classification results;

  • parse_info: File name parsing information to deduce the values of "tile", "start_date", "end_date" from the file name. Unlike non-classified image files, cubes with results have both "start_date" and "end_date". Default is c("X1", "X2", "tile", "start_date", "end_date", "band").

Examples

if (sits_run_examples()) {
    # --- Access to the Brazil Data Cube
    # create a raster cube file based on the information in the BDC
    cbers_tile <- sits_cube(
        source = "BDC",
        collection = "CBERS-WFI-16D",
        bands = c("NDVI", "EVI"),
        tiles = "007004",
        start_date = "2018-09-01",
        end_date = "2019-08-28"
    )
    # --- Access to Digital Earth Africa
    # create a raster cube file based on the information about the files
    # DEAFRICA does not support definition of tiles
    cube_deafrica <- sits_cube(
        source = "DEAFRICA",
        collection = "SENTINEL-2-L2A",
        bands = c("B04", "B08"),
        roi = c(
            "lat_min" = 17.379,
            "lon_min" = 1.1573,
            "lat_max" = 17.410,
            "lon_max" = 1.1910
        ),
        start_date = "2019-01-01",
        end_date = "2019-10-28"
    )
    # --- Access to Digital Earth Australia
    cube_deaustralia <- sits_cube(
        source = "DEAUSTRALIA",
        collection = "GA_LS8CLS9C_GM_CYEAR_3",
        bands = c("RED", "GREEN", "BLUE"),
        roi = c(
            lon_min = 137.15991,
            lon_max = 138.18467,
            lat_min = -33.85777,
            lat_max = -32.56690
        ),
        start_date = "2018-01-01",
        end_date = "2018-12-31"
    )
    # --- Access to CDSE open data Sentinel 2/2A level 2 collection
    # --- remember to set the appropriate environmental variables
    # It is recommended that `multicores` be used to accelerate the process.
    s2_cube <- sits_cube(
        source = "CDSE",
        collection = "SENTINEL-2-L2A",
        tiles = c("20LKP"),
        bands = c("B04", "B08", "B11"),
        start_date = "2018-07-18",
        end_date = "2019-01-23"
    )

    ## --- Sentinel-1 SAR from CDSE
    # --- remember to set the appropriate environmental variables
    roi_sar <- c("lon_min" = 33.546, "lon_max" = 34.999,
                 "lat_min" = 1.427, "lat_max" = 3.726)
    s1_cube_open <- sits_cube(
       source = "CDSE",
       collection = "SENTINEL-1-RTC",
       bands = c("VV", "VH"),
       orbit = "descending",
       roi = roi_sar,
       start_date = "2020-01-01",
       end_date = "2020-06-10"
    )

    # --- Access to AWS open data Sentinel 2/2A level 2 collection
    s2_cube <- sits_cube(
        source = "AWS",
        collection = "SENTINEL-S2-L2A-COGS",
        tiles = c("20LKP", "20LLP"),
        bands = c("B04", "B08", "B11"),
        start_date = "2018-07-18",
        end_date = "2019-07-23"
    )

    # --- Creating Sentinel cube from MPC
    s2_cube <- sits_cube(
        source = "MPC",
        collection = "SENTINEL-2-L2A",
        tiles = "20LKP",
        bands = c("B05", "CLOUD"),
        start_date = "2018-07-18",
        end_date = "2018-08-23"
    )

    # --- Creating Landsat cube from MPC
    roi <- c("lon_min" = -50.410, "lon_max" = -50.379,
             "lat_min" = -10.1910 , "lat_max" = -10.1573)
    mpc_cube <- sits_cube(
        source = "MPC",
        collection = "LANDSAT-C2-L2",
        bands = c("BLUE", "RED", "CLOUD"),
        roi = roi,
        start_date = "2005-01-01",
        end_date = "2006-10-28"
    )

    ## Sentinel-1 SAR from MPC
    roi_sar <- c("lon_min" = -50.410, "lon_max" = -50.379,
                 "lat_min" = -10.1910, "lat_max" = -10.1573)

    s1_cube_open <- sits_cube(
       source = "MPC",
       collection = "SENTINEL-1-GRD",
       bands = c("VV", "VH"),
       orbit = "descending",
       roi = roi_sar,
       start_date = "2020-06-01",
       end_date = "2020-09-28"
    )
    # --- Access to World Cover data (2021) via Terrascope
    cube_terrascope <- sits_cube(
        source = "TERRASCOPE",
        collection = "WORLD-COVER-2021",
        roi = c(
            lon_min = -62.7,
            lon_max = -62.5,
            lat_min = -8.83,
            lat_max = -8.70
        )
    )
    # --- Create a cube based on a local MODIS data
    # MODIS local files have names such as
    # "TERRA_MODIS_012010_NDVI_2013-09-14.jp2"
    # see the parse info parameter as an example on how to
    # decode local files
    data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
    modis_cube <- sits_cube(
        source = "BDC",
        collection = "MOD13Q1-6.1",
        data_dir = data_dir,
        parse_info = c("satellite", "sensor", "tile", "band", "date")
    )

}

e-sensing/sits documentation built on Feb. 13, 2025, 2:22 a.m.