In e-sensing/sits-docs: Documentation for the SITS Package Sensing Data Cubes

devtools::load_all(".")
library(sits)
library(dtwclust)
require(sits)

Introduction

Earth observation satellites provide a regular and consistent set of information about the land and oceans of the planet. Recently, most space agencies have adopted open data policies, making unprecedented amounts of satellite data available for research and operational use. This data deluge has brought about a major challenge: How to design and build technologies that allow the Earth observation community to analyse big data sets?

The approach taken in the current work is to develop data analysis methods that work with satellite image time series, obtained by taking calibrated and comparable measures of the same location in Earth at different times. These measures can be obtained by a single sensor (e.g., MODIS) or by combining different sensors (e.g., Landsat 8 and Sentinel-2). If obtained by frequent revisits, the temporal resolution of these data sets can capture important land use changes.

Time series of remote sensing data show that land cover can occur not only in a progressive and gradual way, but they may also show discontinuities with abrupt changes [@Lambin2003]. Analyses of multiyear time series of land surface attributes, their fine-scale spatial pattern, and their seasonal evolution leads to a broader view of land-cover change. Satellite image time series have already been used in applications such as mapping for detecting forest disturbance [@Kennedy2010], ecology dynamics [@Pasquarella2016], agricultural intensification [@Galford2008], and its impacts on deforestation [@Arvor2012]. Algorithms for processing image time series include BFAST for detecting breaks [@Verbesselt2010], TIMESAT for modelling and measuring phenological attributes [@Jonsson2004] and methods based on Dynamic Time Warping (DTW) for land use and land cover classification [@Petitjean2012][@Maus2016].

In this work, we present sits, an open source R package for satellite image time series analysis. It provides support on how to use machine learning techniques with image time series. These methods include linear and quadratic discrimination analysis, support vector machines, random forests, and neural networks. One important contribution of the package is to support the complete cycle of data analysis for time series classification, including data acquisition, visualisation, filtering, clustering, classification, validation and post-classification adjustments.

Most studies using satellite image time series for land cover classification use a \emph{space-first, time-later} approach. For multiyear studies, researchers first derive best-fit yearly composites and then classify each composite image. For a review of these methods for land use and land cover classification using time series, see [@Gomez2016]. As an alternative to \emph{Space-first, time-later} methods, the SITS package provides support for classification of time series, preserving the full temporal resolution of the input data, using a \emph{time-first, space-later} approach. sits uses all data in the image time series to create larger dimensional spaces for machine learning. The idea is to have as many temporal attributes as possible, increasing the dimension of the classification space. Each temporal instance of a time series is taken as an independent dimension in the feature space of the classifier. To the authors' best knowledge, the classification techniques for image time series included in the package are not previously available in other R or python packages. Furthermore, the package includes methods for filtering, clustering and post-processing that also have not been published in the literature.

Workflow

The main aim of sits is to support land cover and land change classification of image data cubes using machine learning methods. The basic workflow is:

Create a data cube using image collections available in the cloud or in local machines.
Extract time series from the data cube which are used as training data.
Perform quality control and filtering on the samples.
Train a machine learning model using the extracted samples.
Classify the data cube using the trained model.
Post-process the classified images.

Handling Data Cubes in sits

Image data cubes as the basis for big Earth observation data analysis

In broad terms, the cloud computing model is one where large satellite-generated data sets are archived on cloud services, which also provide computing facilities to process them. By using cloud services, users can share big Earth observation databases and minimize the amount of data download. Investment in infrastructure is minimised and sharing of data and software increases. However, data available in the cloud is best organised for analysis by creating data cubes.

Generalising @Appel2019, we consider that a data cube is a four-dimensional structure with dimensions x (longitude or easting), y (latitude or northing), time, and bands. Its spatial dimensions refer to a single spatial reference system (SRS). Cells of a data cube have a constant spatial size (with regard to the cube’s SRS). The temporal dimension is specified by a set of intervals. For every combination of dimensions, a cell has a single value. Data cubes are particularly amenable for machine learning techniques; their data cane be transformed into arrays in memory, which can be fed to training and classification algorithms. Given the widespread availability of large data sets of Earth observation data, there is a growing interest in organising large sets of data into "data cubes".

Using STAC to Access Image Data Cubes

One of the distinguishing features of SITS is that it has been designed to work with big satellite image data sets which reside on the cloud and with data cubes. Many R packages that work with remote sensing images require data to be accessible in a local computer. However, with the coming of age of big Earth observation data, it is not always practical to transfer large data sets. Users have to rely on web services to provide access to these data sets. In this context, SITS is based on access to data cubes using information provided by STAC (Spatio-temporal Access Catalogue).

Currently, SITS supports data cubes available in the following cloud services:

Sentinel-2/2A level 2A images in AWS.
Collections of Sentinel, Landsat and CBERS images in the Brazil Data Cube (BDC).
Collections available in Digital Earth Africa.
Data cubes produced by the "gdalcubes" package.
Local image collections.

The user can define a data cube by selecting a collection in a cloud service and then defining a space-time extent. For example, the following code will define a data cube of Sentinel-2/2A images using AWS.

s2_cube <- sits_cube(
    type = "S2_L2A_AWS",
    name = "T20LKP_2018_2019",
    satellite = "SENTINEL-2",
    sensor = "MSI",
    tiles = "20LKP",
    s2_aws_resolution = "20m",
    start_date = as.Date("2018-08-01"),
    end_date = as.Date("2019-07-31")
)

In the above example, the user has selected the "Sentinel-2 Level 2" collection in the AWS cloud services. The geographical area of the data cube is defined by the tile "20LKP", and the temporal extent by a start and end date. Access to other cloud services works in similar ways.

Users can derive data cubes from ARD data which have pre-defined temporal resolutions. For example, a user may want to define the best Sentinel-2 pixel in a one month period, as shown below. This can be done in SITS by the sits_regularize which calls the "gdalcubes" package. For details in gdalcubes, please see https://github.com/appelmar/gdalcubes.

gc_cube <- sits_regularize(cube        = s2_cube,
                           name        = "T20LKP_2018_2019_1M",
                           path_db     = "/my/path/cube.db",
                           path_images = "/my/path/images/",
                           period      = "P1M",
                           agg_method  = "median",
                           resampling  = "bilinear"
)

Defining a data cube using files

To define a data cube using plain files (without STAC information), all image files should have the same spatial resolution and same projection. Each file contains a single image band for a single date. Since raster files in popular formats (e.g., GeoTiff and Jpeg2000) do not include time information, the name of each file needs to include date and band information. Timeline and bands are deduced from filenames. For example, "CBERS-4_AWFI_B13_2018-02-02.tif" is a valid name. The user has to provide parsing information to allow sits to extract the band and the date. In the example above, the parsing info is c("X1", "X2", "band", "date) and the delimiter is "_".

# Create a cube based on a stack of CBERS data
data_dir <- system.file("extdata/raster/cbers", package = "sits")

# files are named using the convention 
# "CBERS-4_AWFI_B13_2018-02-02.tif"
cbers_cube <- sits_cube(
      source = "LOCAL",
      name = "022024",
      satellite = "CBERS-4",
      sensor = "AWFI",
      data_dir = data_dir,
      delim = "_",
      parse_info = c("X1", "X2", "band", "date")
)
# print the timeline of the cube
sits_timeline(cbers_cube)
# print the bounding box of the cube
sits_bbox(cbers_cube)

Handling satellite image time series in sits

Data structure

Training a machine learning model in sits requires a set of time series, describing properties in spatio-temporal locations of interest. For land use classification, this set consists of samples provided by experts that take in-situ field observations or recognize land classes using high resolution images. The package can also be used for any type of classification, provided that the timeline and bands of the time series (used for training) match that of the data cubes.

For handling time series, the package uses a sits tibble to organize time series data with associated spatial information. A tibble is a generalization of a data.frame, the usual way in R to organise data in tables. Tibbles are part of the tidyverse, a collection of R packages designed to work together in data manipulation [@Wickham2017]. As a example of how the sits tibble works, the following code shows the first three lines of a tibble containing $1,218$ labelled samples of land cover in Mato Grosso state of Brazil, with four classes: "Forest", "Cerrado", "Pasture", "Soybean-Corn".

# data set of samples
data(samples_modis_4bands)
samples_modis_4bands[1:3,]

A sits tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, label assigned to the sample, and the data cube from where the data has been extracted. The spatial location is given in longitude and latitude coordinates for the "WGS84" ellipsoid. For example, the first sample has been labelled "Pasture", at location ($-55.1852$, $-10.8378$), and is valid for the period (2013-09-14, 2014-08-29).

# print the first time series records of the first sample
sits_time_series(samples_modis_4bands[1,])[1:3,]

Obtaining time series data

To get a time series in sits, one has to first create a data cube first. Users can request one or more time series points from a data cube by using sits_get_data(). This function provides a general means of access to image time series. Given data cue, the user provides the latitude and longitude of the desired location, the bands, and the start date and end date of the time series. If the start and end dates are not provided, it retrieves all the available period. The result is a tibble that can be visualized using plot().

library(sits)
data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
modis_cube <- sits_cube(
    source = "LOCAL",
    name = "sinop-2014",
    satellite = "TERRA",
    sensor = "MODIS",
    data_dir = data_dir,
    delim = "_",
    parse_info = c("X1", "X2", "band", "date")
)

# obtain a set of locations defined by a CSV file
csv_raster_file <- system.file("extdata/samples/samples_sinop_crop.csv",
                               package = "sits")

# retrieve the points from the data cube
points <- sits_get_data(modis_cube, file = csv_raster_file)

# plot the first point
plot(points[1,])

Filtering techniques

The literature on satellite image time series have several applications of filtering to correct or smooth vegetation index data. The following filters are available in SITS and are described in more detail in the vignette "Satellite Image Time Series Filtering with SITS":

Savitzky–Golay filter (sits_sgolay)
Whittaker filter (sits_whittaker)
Envelope filter (sits_envelope)

The SITS package uses a common interface to all filter functions with the sits_filter. The function has two parameters: the dataset to be filtered and filter for the filter to be applied. To aid on data visualisation, all bands which are filtered have a suffix which is appended, as shown in the examples below. Here we show an example using the Whittaker smoother, which has been proposed in literature [@Atzberger2011] as arguably the most appropriate one to use for satellite image time series. The Whittaker smoother attempts to fit a curve that represents the raw data, but is penalized if subsequent points vary too much [@Atzberger2011]. As such, it balances between the residual to the original data and the "smoothness" of the fitted curve. It uses the parameter lambda to control the degree of smoothing.

# Take a NDVI time series, apply Whittaker filter and plot the series
point_ndvi <- sits_select(point_mt_6bands, bands = "NDVI")
point_whit <- sits_whittaker(point_ndvi, lambda = 5.0)
# merge with original data and plot the original and the filtered data
point_whit %>% 
  sits_merge(point_ndvi) %>% 
  plot()

Clustering for sample quality control using self-organizing maps

One of the key challenges of machine learning classification models is assessing the quality of the training data sets. It is useful to apply pre-processing methods to improve the quality of the samples and to remove those that might have been wrongly labeled or that have low discriminatory power. Good samples lead to good classification maps. sits provides support for two clustering methods to test sample quality: (a) Agglomerative Hierarchical Clustering (AHC); (b) Self-organizing Maps (SOM). Full details of the cluster methods used in SITS are available in the vignette 'Clustering of Satellite Image Time Series with SITS'.

Classification using machine learning

There has been much recent interest in using classifiers such as support vector machines \citep{Mountrakis2011} and random forests \citep{Belgiu2016} for remote sensing images. Most often, researchers use a \emph{space-first, time-later} approach, in which the dimension of the decision space is limited to the number of spectral bands or their transformations. Sometimes, the decision space is extended with temporal attributes. To do this, researchers filter the raw data to get smoother time series \citep{Brown2013, Kastens2017}. Then, using software such as TIMESAT \citep{Jonsson2004}, they derive a small set of phenological parameters from vegetation indexes, like the beginning, peak, and length of the growing season \citep{Estel2015, Pelletier2016}.

In a recent review of machine learning methods to classify remote sensing data [@Maxwell2018], the authors note that many factors influence the performance of these classifiers, including the size and quality of the training dataset, the dimension of the feature space, and the choice of the parameters. We support both \emph{space-first, time-later} and \emph{time-first, space-later} approaches. Therefore, the sits package provides functionality to explore the full depth of satellite image time series data.

When used in \emph{time-first, space-later} approache, sits treats time series as a feature vector. To be consistent, the procedure aligns all time series from different years by its time proximity considering an given cropping schedule. Once aligned, the feature vector is formed by all pixel "bands". The idea is to have as many temporal attributes as possible, increasing the dimension of the classification space. In this scenario, statistical learning models are the natural candidates to deal with high-dimensional data: learning to distinguish all land cover and land use classes from trusted samples exemplars (the training data) to infer classes of a larger data set.

The sits package provides a common interface to all machine learning models, using the sits_train function. this function takes two parameters: the input data samples and the ML method (ml_method), as shown below. After the model is estimated, it can be used to classify individual time series or full data cubes using the sits_classify function. In the examples that follow, we show how to apply each method for the classification of a single time series. Then, we disscuss how to classify full data cubes.

When a dataset of time series organised as a tibble is taken as input to the classifier, the result is the same tibble with one additional column ("predicted"), which contains the information on what labels are have been assigned for each interval. The following example illustrate how to train a dataset and classify an individual time series. First we use the sits_train function with two parameters: the training dataset (described above) and the chosen machine learning model (in this case, a random forest classifier). The trained model is then used to classify a time series from Mato Grosso Brazilian state, using sits_classify. The results can be shown in text format using the function sits_show_prediction or graphically using plot.

```r)."}

select the data for classification

Train a machine learning model using Random Forest

rfor_model <- sits_train(data = samples_modis_4bands, ml_method = sits_rfor(num_trees = 1000))

get a point to be classified

point_4bands <- sits_select(point_mt_6bands, bands = c("NDVI", "EVI", "NIR", "MIR"))

Classify using random forest model and plot the result

class.tb <- sits_classify(point_4bands, rfor_model)

show the results of the prediction

sits_show_prediction(class.tb)

plot the results of the prediction

plot(class.tb)

The following methods are available in SITS for training machine learning models:

* Linear discriminant analysis (`sits_lda`)
* Quadratic discriminant analysis (`sits_qda`)
* Multinomial logit and its variants 'lasso' and 'ridge' (`sits_mlr`)
* Support vector machines (`sits_svm`)
* Random forests (`sits_rfor`)
* Extreme gradient boosting (`sits_xgboost`)
* Deep learning (DL) using multi-layer perceptrons (`sits_deeplearning`)
* DL with 1D convolutional neural networks (`sits_CNN`), 
* DL combining 1D CNN and multi-layer perceptron networks (`sits_tempCNN`)
* DL using 1D version of ResNet (`sits_ResNet`).
* DL using a combination of long-short term memory (LSTM) and 1D CNN (`sits_LSTM_FCN`)

For more details on each method, please see the vignette ["Machine Learning for Data Cubes using the SITS package"]() 

# Validation techniques

Validation is a process undertaken on models to estimate some error associated with them, and hence has been used widely in different scientific disciplines. Here, we are interested in estimating the prediction error associated to some model. For this purpose, we concentrate on the *cross-validation* approach, probably the most used validation technique [@Hastie2009].

To be sure, cross-validation estimates the expected prediction error. It uses part of the available samples to fit the classification model, and a different part to test it. The so-called *k-fold* validation, we split the data into $k$ partitions with approximately the same size and proceed by fitting the model and testing it $k$ times. At each step, we take one distinct partition for test and the remaining ${k-1}$ for training the model, and calculate its prediction error for classifying the test partition. A simple average gives us an estimation of the expected prediction error. 

A natural question that arises is: *how good is this estimation?* According to @Hastie2009, there is a bias-variance trade-off in choice of $k$. If $k$ is set to the number of samples, we obtain the so-called *leave-one-out* validation, the estimator gives a low bias for the true expected error, but produces a high variance expectation. This can be computational expensive as it requires the same number of fitting process as the number of samples. On the other hand, if we choose ${k=2}$, we get a high biased expected prediction error estimation that overestimates the true prediction error, but has a low variance. The recommended choices of $k$ are $5$ or $10$ [@Hastie2009], which somewhat overestimates the true prediction error.

`sits_kfold_validate()` gives support the k-fold validation in `sits`. The following code gives an example on how to proceed a k-fold cross-validation in the package. It perform a five-fold validation using SVM classification model as a default classifier. We can see in the output text the corresponding confusion matrix and the accuracy statistics (overall and by class).

```r
# perform a five fold validation for the "cerrado_2classes" data set
# Random Forest machine learning method using default parameters
acc <- sits_kfold_validate(cerrado_2classes, 
                           folds = 5, 
                           ml_method = sits_rfor(num_trees = 1000))

Cube classification

The continuous observation of the Earth surface provided by orbital sensors is unprecedented in history. Just for the sake of illustration, a unique tile from MOD13Q1 product, a square of $4800$ pixels provided every 16 days since February 2000 takes around $18$GB of uncompressed data to store only one band or vegetation index. This data deluge puts the field into a big data era and imposes challenges to design and build technologies that allow the Earth observation community to analyse those data sets [@Camara2017].

To classify a data cube, use the function sits_classify() as described below. This function works with data cubes built with the The classification algorithms allows users to choose how many process will run the task in parallel, and also the size of each data chunk to be consumed at each iteration. This strategy enables sits to work on average desktop computers without depleting all computational resources. The code bellow illustrates how to classify a small raster brick image that accompany the package.

Steps for cube classification

Once a data cube which has associated files is defined, the steps for classification are:

Select a set of training samples.
Train a machine learning model
Classify the data cubes using the model, producing a data cube with class probabilities.
Label the cube with probabilities, including data smoothing if desired.

Adjustments for improved performance

To reduce processing time, it is necessary to adjust sits_classify() according to the capabilities of the server. The package tries to keep memory use to a minimum, performing garbage collection to free memory as often as possible. Nevertheless, there is an inevitable trade-off between computing time, memory use, and I/O operations. The best trade-off has to be determined by the user, considering issues such disk read speed, number of cores in the server, and CPU performance.

The first parameter is memsize. It controls the size of the main memory (in GBytes) to be used for classification. The user must specify how much free memory will be available. The second factor controlling performance of raster classification is multicores. Once a block of data is read from disk into main memory, it is split into different cores, as specified by the user. In general, the more cores are assigned to classification, the faster the result will be. However, there are overheads in switching time, especially when the server has other processes running.

Based on current experience, the classification of a Sentinel-2 tile at 20 meter resolution (5490*5490 pixels), with six bands and 66 time instances covering one year of data, using SVM with a training data set of about 10,000 samples, takes about 3 hours using 20 cores and a memory size of 60 GB, in a server with 2.4GHz Xeon CPU and 96 GB of memory to produce the yearly classification maps.

# Retrieve the set of samples for the Mato Grosso region 
# Select the data for classification
samples_2bands  <- sits_select(samples_modis_4bands, 
                               bands = c("NDVI", "EVI"))

# build a machine learning model for this area
svm_model <- sits_train(samples_2bands, sits_svm())

# create a data cube to be classified
# Cube is composed of MOD13Q1 images from the Sinop region in Mato Grosso (Brazil)
data_dir <- system.file("extdata/raster/mod13q1", package = "sits")
sinop <- sits_cube(
    source = "LOCAL",
    name = "sinop-2014",
    satellite = "TERRA",
    sensor = "MODIS",
    data_dir = data_dir,
    delim = "_",
    parse_info = c("X1", "X2", "band", "date")
)
# Classify the raster cube, generating a probability file
probs_cube <- sits_classify(sinop, 
                            ml_model = svm_model, 
                            output_dir = tempdir(),
                            memsize = 16,
                            multicores = 4,
                            verbose = FALSE)

# plot the probabilities cubes
plot(probs_cube)

Smoothing and Labelling of raster data after classification

Post-processing is a desirable step in any classification process. Most statistical classifiers use training samples derived from "pure" pixels, that have been selected by users as representative of the desired output classes. However, images contain many mixed pixels irrespective of the resolution. Also, there is a considerable degree of data variability in each class. These effects lead to outliers whose chance of misclassification is significant. To offset these problems, most post-processing methods use the "smoothness assumption" [@Schindler2012]: nearby pixels tend to have the same label. To put this assumption in practice, smoothing methods use the neighbourhood information to remove outliers and enhance consistency in the resulting product.

Smoothing methods are an important complement to machine learning algorithms for image classification. Since these methods are mostly pixel-based, it is useful to complement them with post-processing smoothing to include spatial information in the result. For each pixel, machine learning and other statistical algorithms provide the probabilities of that pixel belonging to each of the classes. As a first step in obtaining a result, each pixel is assigned to the class whose probability is higher. After this step, smoothing methods use class probabilities to detect and correct outliers or misclassified pixels. SITS uses a Bayesian smoothing method, which provides the means to incorporate prior knowledge in data analysis. For more details on the smoothing procedure, please see the vignette "Post classification smoothing using Bayesian techniques in SITS".

Doing post-processing using Bayesian smoothing in SITS is straightforward. The result of the sits_classify function applied to a data cube is set of more probability images, one per requested clasification interval. The next step is to apply the sits_smooth function. By default, this function selects the most likely class for each pixel considering a Bayesian estimator that takes into account the neighbours. The following example takes the previously produced classification output and applies a Bayesian smoothing.

# smooth the result with a bayesian filter
sinop_bayes <- sits_smooth(probs_cube, output_dir = tempdir())
# label the resulting image
label_bayes <- sits_label_classification(sinop_bayes, output_dir = tempdir())

# plot the image
plot(label_bayes)

knitr::include_graphics(system.file("extdata/markdown/figures", 
                                    "plot_image_classification.png", 
                                    package = "sits"))

Final remarks

Current approaches to image time series analysis still use limited number of attributes. A common approach is deriving a small set of phenological parameters from vegetation indices, like beginning, peak, and length of growing season [@Brown2013], [@Kastens2017], [@Estel2015], [@Pelletier2016]. These phenological parameters are then fed in specialized classifiers such as TIMESAT [@Jonsson2004]. These approaches do not use the power of advanced statistical learning techniques to work on high-dimensional spaces with big training data sets [@James2013].

Package sits can use the full depth of satellite image time series to create larger dimensional spaces. We tested different methods of extracting attributes from time series data, including those reported by @Pelletier2016 and @Kastens2017. Our conclusion is that part of the information in raw time series is lost after filtering. Thus, the method we developed uses all the data available in the time series samples. The idea is to have as many temporal attributes as possible, increasing the dimension of the classification space. Our experiments found out that modern statistical models such as support vector machines, and random forests perform better in high-dimensional spaces than in lower dimensional ones.

Additional information

For more information, please see the vignettes

Acknowledgements

The authors would like to thank all the researchers that provided data samples used in the examples: Alexandre Coutinho, Julio Esquerdo and Joao Antunes (Brazilian Agricultural Research Agency, Brazil) who provided ground samples for "soybean-fallow", "fallow-cotton", "soybean-cotton", "soybean-corn", "soybean-millet", "soybean-sunflower", and "pasture" classes; Rodrigo Bergotti (National Institute for Space Research, Brazil) who provided samples for "cerrado" and "forest" classes; and Damien Arvor (Rennes University, France) who provided ground samples for "soybean-fallow" class.

This work was partially funded by the São Paulo Research Foundation (FAPESP) through eScience Program grant 2014/08398-6. We thank the Coordination for the Improvement of Higher Education Personnel (CAPES) and National Council for Scientific and Technological Development (CNPq) grants 312151/2014-4 (GC) and 140684/2016-6 (RS). We thank Ricardo Cartaxo and Lúbia Vinhas who provided insight and expertise to support this paper.

This work has also been supported by the International Climate Initiative of the Germany Federal Ministry for the Environment, Nature Conservation, Building and Nuclear Safety under Grant Agreement 17-III-084-Global-A-RESTORE+ (``RESTORE+: Addressing Landscape Restoration on Degraded Land in Indonesia and Brazil'').

e-sensing/sits-docs documentation built on March 30, 2021, 10:59 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

e-sensing/sits-docs
Documentation for the SITS Package Sensing Data Cubes

In e-sensing/sits-docs: Documentation for the SITS Package Sensing Data Cubes

Introduction

Workflow

Handling Data Cubes in sits

Image data cubes as the basis for big Earth observation data analysis

Using STAC to Access Image Data Cubes

Defining a data cube using files

Handling satellite image time series in sits

Data structure

Obtaining time series data

Filtering techniques

Clustering for sample quality control using self-organizing maps

Classification using machine learning

select the data for classification

Train a machine learning model using Random Forest

get a point to be classified

Classify using random forest model and plot the result

show the results of the prediction

plot the results of the prediction

Cube classification

Steps for cube classification

Adjustments for improved performance

Smoothing and Labelling of raster data after classification

Final remarks

Additional information

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

e-sensing/sits-docs Documentation for the SITS Package Sensing Data Cubes

In e-sensing/sits-docs: Documentation for the SITS Package Sensing Data Cubes

Introduction

Workflow

Handling Data Cubes in sits

Image data cubes as the basis for big Earth observation data analysis

Using STAC to Access Image Data Cubes

Defining a data cube using files

Handling satellite image time series in sits

Data structure

Obtaining time series data

Filtering techniques

Clustering for sample quality control using self-organizing maps

Classification using machine learning

select the data for classification

Train a machine learning model using Random Forest

get a point to be classified

Classify using random forest model and plot the result

show the results of the prediction

plot the results of the prediction

Cube classification

Steps for cube classification

Adjustments for improved performance

Smoothing and Labelling of raster data after classification

Final remarks

Additional information

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

e-sensing/sits-docs
Documentation for the SITS Package Sensing Data Cubes