The R package DataQualityCheckEuracAlpEnv provide functions and examples to manage data quality for automatic microclimatic stations network. The stations collect data of many sensors and send datatables at regular intervals via GSM using Loggernet, a software developed by Campbell Scientific to manage loggers. Many problems affects raw data due to missing connections, due to manual preprocessing or due to software updates. We need to check if the data downloaded were well formatted and detect, as soon as possbile, failures of the sensors installed. For these reasons we developed the DataQualityCheckEuracAlpEnv package containg usefull functions and scripts to different purpose.
The newtwork of stations managed by Institute of Alpine Environment consist in 28 microclimatic stations used mainly for research purpose, ecology, hydrology and climate change impact are the study fields.
The stations belong to 2 project:
Manage and check real time data, collect them, detect possible bugs and outliers, and save, if it is possible in a regular time series usable from researcher. To do that we developed the script DQC_Hourly_Linux_v6.R that runs in a cronjob every hour. It is used for urgent problems.
Analysis of troubles occoured in the last period. This is done by the script DQC_Reports.R. It is a tool for mainenace to have an overview of the healt status of the stations and sensors, detecting anomlies and exceptional events. This script runs automatically (cronjob) every week.
Analysis and fixing of historical data. It is used to check old data and old files, to detect structure change and to highline the typical problem of the manual preprocesing. The script that do that is DQC.R and is used to prepare hystorical data.
For stability reason the scripts run on a Linux virtual machine called HPCgeo01 prepared by the ICT. For the historical analysis the script was structured for an usage on a Windows machine. We are developing an user friendly interface to help the user to configure path and folder structure.
This package is a collection of scripts used to manage data and pictures flow from the stations to a storage. Here we describe how the scripts work and which are the roles!
Every hour on the HPCgeo01 run four scripts:
at 01' the script check_DQC_locked.R check if the script DQC_Hourly_Linux_v6.R is locked for more than an hour due to a bug, and check if the script DQC_Reports.R is locked for more than a week (both LTER and MONALISA)
at 15' every pictures downloaded from the station were readed from the loggernet folder by DQC_Pics.R, checked and stored inside every station subfolder, so you can have under the same folder data raw, data total, data processed, pics and reports.
at 25' the script DQC_Move_Wrong_Files.R move out of the loggernet folder all the files whith wrong names. What does it mean? The name of files downloaded with loggernet is composed is composed of two parts: the station name defined in loggernet and the table name defined by the station software. In turn the loggernet station name is defined as the name of the project plus the name of the station name. Here an example for the station B1: the loggernet station name is LTER_B1 and the table name is B1, therefore the file name is LTER_B1_B1.dat. This script check if the station name and the table name are the same, otherwise there is an error on the IP assingment. The dynamicDNS doesn't assign the proper DNS at the station. The wrong files were archived in a storage folder to backup and to future analysis.
at 30' the script DQC_Hourly_Linux_v6.R check the data downloaded from the stations, highlighting different problems. The first check detect the status of the station. Based on "Date Modified" compared with a download table (a table with the last date downloaded and the last date modified). If the station are online, structure was checked, overlaps and date gaps recoverable (due to new software or record gaps). In these cases the DQC stops and require an action to unlock the situation. Minor problems are subsequently searched for and flagged. These problems doesn't require any action due to automatic fixing! All the problems detected are collected in a html report sended by email to the mainenance staff instantaneously and a reminder every 24h if the problem require an action. This is done using a rmarkdown script linked with the hourly script.
Every day on the HPCgeo01 run a script:
Every week on the HPCgeo01 run two scripts (one with different parametrization):
at 00:30 the script DQC_Reports.R, with the parametrization ** --prj "LTER"** check the LTER stations for the last week, or for the last downloaded data In this way we can have an overview on the problems occoured to the stations and an overview of parameters out of range. For a deeper analysis we check the range of standar deviation to monitoring consant value and noise signal. This script use rmarkdown scripts to produce html reports, with a traffic light system to represent the problems occured
at 01:30 the script DQC_Reports.R, with the parametrization ** --prj "MONALISA"** check the MONALISA stations for the last week, or for the last downloaded data
To use the package and the features of the scripts and rmarkdown
The 3 scripts named before are structured in this way: a file management system to prepare files, folders and to summaryze results. Inside there is a core script that apply some function in the proper oreder, every function are indipendent but some actions need to be executed consecutively.
A detailed function are available here
2.1 Clone the repository
Clone the entire repository from https://gitlab.inf.unibz.it/Christian.Brida/dataqualitycheckeuracalpenv.git in a local folder
2.2 Download package
Download the package from GitLab
For credential ask directly to Christian.Brida@eurac.edu (Institute for Alpine Environment) or Luca.Cattani@eurac.edu (ICT)
2.3 Dowload libraries
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.