We welcome all contributors to sits package! Please submit questions, bug reports, and requests in the issues tracker. If you plan to contribute code, go ahead! Fork the repo and submit a pull request. A few notes:
New functions that build on the sits
API should follow the general principles below.
The target audience for sits
is the community of remote sensing experts with Earth Sciences background who want to use state-of-the-art data analysis methods with minimal investment in programming skills. The design of the sits
API considers the typical workflow for land classification using satellite image time series and thus provides a clear and direct set of functions, which are easy to learn and master.
For this reason, we welcome contributors that provide useful additions to the existing API, such as new ML/DL classification algorithms. In case of a new API function, before making a pull request please raise an issue stating your rationale for a new function.
Most functions in sits
use the S3 programming model with a strong emphasis on generic methods wich are specialized depending on the input data type. See for example the implementation of the sits_bands()
function.
Please do not include contributed code using the S4 programming model. Doing so would break the structure and the logic of existing code. Convert your code from S4 to S3.
Use generic functions as much as possible, as they improve modularity and maintenance. If your code has decision points using if-else
clauses, such as if A, do X; else do Y
consider using generic functions.
Functions that use the torch
package use the R6 model to be compatible with that package. See for example, the code in sits_tempcnn.R
and api_torch.R
. To convert pyTorch
code to R and include it is straightforward. Please see the Technical Annex of the sits on-line book.
tidyverse
, sf
and terra
The sits code
relies on the packages of the tidyverse
to work with tables and list. We use dplyr
and tidyr
for data selection and wrangling, purrr
and slider
for loops on lists and table, lubridate
to handle dates and times.
sits
data typesThe sits
package in built on top of three data types: time series tibble, data cubes and models. Most sits
functions have one or more of these types as inputs and one of them as return values.
The time series tibble contains data and metadata. The first six columns contain the metadata: spatial and temporal information, the label assigned to the sample, and the data cube from where the data has been extracted. The time_series column contains the time series data for each spatiotemporal location. All time series tibbles are objects of class sits
.
The cube
data type is designed to store metadata about image files. In principle, images which are part of a data cube share the same geographical region, have the same bands, and have been regularized to fit into a pre-defined temporal interval. Data cubes in sits
are organized by tiles. A tile is an element of a satellite's mission reference system, for example MGRS for Sentinel-2 and WRS2 for Landsat. A cube
is a tibble where each row contains information about data covering one tile. Each row of the cube tibble contains a column named file_info
; this column contains a list that stores a tibble
The cube
data type is specialised in raster_cube
(ARD images), vector_cube
(ARD cube with segmentation vectors). probs_cube
(probabilities produced by classification algorithms on raster data), probs_vector_cube
(probabilites generated by vector classification of segments), uncertainty_cube
(cubes with uncertainty information), and class_cube
(labelled maps). See the code in sits_plot.R
as an example of specialisation of plot
to handle different classes of raster data.
All ML/DL models in sits
which are the result of sits_train
belong to the ml_model
class. In addition, models are assigned a second class, which is unique to ML models (e.g, rfor_model
, svm_model
) and generic for all DL torch
based models (torch_model
). The class information is used for plotting models and for establishing if a model can run on GPUs.
The internal sits
code has no literal values, which are all stored in the YAML configuration files ./inst/extdata/config.yml
and ./inst/extdata/config_internals.yml
. The first file contains configuration parameters that are relevant to users, related to visualisation and plotting; the second contains parameters that are relevant only for developers. These values are accessible using the .conf
function. For example, the value of the default size for leaflet objects (64 MB) is accessed using the command .conf["view", "leaflet_megabytes"]
.
Error messages are also stored outside of the code in the YAML configuration file ./inst/extdata/config_messages.yml
. These values are accessible using the .conf
function. For example, the error associated to an invalid NA value for an input parameter is accessible using th function .conf("messages", ".check_na_parameter")
.
Color handling in sits
is described in the Technical Annex section "How colors work in sits". The legends and colors available by default are described in the YAML file ./inst/extdata/config_colors.yml
.
If you want to include a STAC-based catalogue not yet supported by sits
, we encourage you to look at existing implementations of catalogues such as Microsoft Planetary Computer (MPC), Digital Earth Africa (DEA) and AWS.
STAC-based catalogues in sits
are associated to YAML description files, which are available in the directory .inst/exdata/sources
. For example, the YAML file config_source_mpc.yml
describes the contents of the MPC collections supported by sits
. Please first provide an YAML file which lists the detailed contents of the new catalogue you wish to include. Follow the examples provided.
After writing the YAML file, you need to consider how to access and query the new catalogue. The entry point for access to all catalogues is the sits_cube.stac_cube()
function, which in turn calls a sequence of functions which are described in the generic interface api_source.R
. Most calls of this API are handled by the functions of api_source_stac.R
which provides an interface to the rstac
package and handles STAC queries.
Each STAC catalogue is different. The STAC specification allows providers to implement their data descriptions with specific information. For this reason, the generic API described in api_source.R
needs to be specialized for each provider. Whenever a provider needs specific implementations of parts of the STAC protocol, we include them in separate files. For example, api_source_mpc.R
implements specific quirks of the MPC platform. Similarly, specific support for CDSE (Copernicus Data Space Environment) is available in api_source_cdse.R
.
In general terms, ML/DL algorithms in sits
are encapsulated as closures which are the output of the sits_train()
function. In line with the established practices in R, each closure contains a function that classifies input values, as well as information on the samples used to train the model.
Please read the Technical Annex to the sits
book. It describes how include a new ML method, in this case the lightGBM
algorithm. Follow those guidelines to include a new ML algorithm.
If you aim to include a torch
based deep learning method, in addition to understanding the concepts presented in the Technical Annex please study carefully the implementation of sits_tempcnn()
and sits_lighttae()
.
Bear in mind that your only task is to provide a new function that is compatible with the requirements of ML/DL methods in sits
. Once the function has been correctly implemented, you will be able to use in connection with the rest of sits
.
sits
is included as part of the issues tracker. Issues created by the developers are assigned to milestones. Each milestone corresponds to an expected new version of sits
to be released in CRAN.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.