1. Context & objectives

1.1. Context

European directive 2009/128/CE : establishing a framework for Community action to achieve the sustainable use of pesticides

more

The European directive 2009/128/CE imposes member-states to set up tools that allow for a more rational use of crop protection products. Among these tools, agricultural warning systems, based on crop monitoring models for the control of pests and diseases are widely adopted and have proved their efficiency. However, due to the difficulty to get meteorological data at high spatial resolution (at the parcel scale), they still are underused. The use of geostatistical tools (Kriging, Multiple Regressions, Artificial Neural Networks, etc.) makes it possible to interpolate data provided by physical weather stations in such a way that a high spatial resolution network (mesh size of 1 km2) of virtual weather stations could be generated. That is the objective of the AGROMET project.

1.2. Objective

Provide hourly 1km² gridded datasets of weather parameters with the best accuracy (i.e. spatialize hourly records from the stations on the whole area of Wallonia) = SPATIALIZATION

more

The project aims to set up an operational web-platform designed for real-time agro-meteorological data dissemination at high spatial (1km2) and temporal (hourly) resolution. To achieve the availability of data at such a high spatial resolution, we plan to “spatialize” the real-time data sent by more than 30 connected physical weather stations belonging to the PAMESEB and RMI networks. This spatialization will then result in a gridded dataset corresponding to a network of 16 000 virtual stations uniformly spread on the whole territory of Wallonia. These “spatialized” data will be made available through a web-platform providing interactive visualization widgets (maps, charts, tables and various indicators) and an API allowing their use on the fly, notably by agricultural warning systems providers. An extensive and precise documentation about data origin, geo-statistic algorithms used and uncertainty will also be available.

Best suited tools :

  1. ~~physical atmospherical models~~ (not straight forward to develop an explicit physical model describing how the output data can be derived from the input data)
  2. supervised machine learning regression algorithms that given a set of continuous data, find the best relationship that represents the set of continuous data (common approach largely discussed in the academic litterature)
  3. Our main goal will be to choose, for each weather parameter, the best suited supervised machine learning regression method

2. Key definitions

2.1. Spatialization

Spatialization or spatial interpolation creates a continuous surface from values measured at discrete locations to predict values at any location in the interest zone with the best accuracy.

In the chapter The principles of geostatistical analysis of the Using ArcGis Geostatistical analyst, K. Johnston gives an efficient overview of what spatialization is and what are the two big groups of techniques (deterministic and stochastic).

2.2. Supervised machine learning

From machinelearningmastery.com :

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output : Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (x), you can predict the output variables (Y) for that data.
It is called supervised learning because the process of an algorithm learning from the training dataset can be thought of as a teacher supervising the learning process

Also check this worth reading post

3. Defining the best supervised machine learning regression method

3.1. Our general approach

3.2. Step-by-step workflow

  1. From our historical dataset of hourly weather records (Pameseb db)
  2. filter a representative subset of records (e.g. 5 years of continuous hourly records) + select the "good" stations
  3. For each hourly set of records (30 stations - or more (by integrating IRM network? )
  4. run a benchmark experiment where different desired regression learning algorithms are applied to various regression tasks (i.e. datasets with different combinations of explanatory variables + the target weather parameter) with the aim to compare and rank the combinations of algorithm + used explanatory variables using a cross validation resampling strategy (LOOCV) that provides the desired performance metrics (RMSE or MAE?)

  1. Then aggregate, by calculating the mean, all the hourly performance measures on the whole representative subset to choose the method (= regression learning algorithm + regression task) that globally performs the best
  2. For each desired hourly dataset, apply the choosen method to build a model to make spatial predictions
  3. Use maps to vizualize the predictions and their uncertainty
  4. Make the predictions available on the platform together with its uncertainty indicator

3.3. workflow activity diagrams

spatialization methodology viewer

3.4. Which target dependent variables ?

... or variables to be spatialized

3.5. Which independent variables ?

... or explanatory variables

3.6. Which R config and packages ?

In order to ensure science reproductibility (why it is important), the code (R) is developed in a self-maintained and publicly available Docker image

In addition to the famous tidyverse packages suite, we use bleeding edge R packages :

4. Conclusion



pokyah/agrometeoR documentation built on May 26, 2019, 7 p.m.