TO WRITE ALL! TO COMPLETE

Requirements

The requirements depends on the your work porpuse. Depending if you want to use only the functions in the R package to create your own data quality check system or if you want to use the algorithms implemented in the R and Rmarkdown scripts. In the first case the procedures is the usual procedure to install the package directly from git to R using the package devtools. The second porpuse allow to use the scripts already prepared and configured for the standard usage in the Institute for Alpine Environment, Eurac. To to that you have to clone the repository on your local pc and run the scripts from there.

In this document we want to explain how to install the package, clone in your local pc, check available updates and the libraries required to run the scripts.

Install the package

What is a package? How to install? Answer (Any question about credentials ask to Eurac ICT or to the Author)

This file try to explain what the scripts do and what is needed to work. Every script has a folder structure and support files used to manage the regular data processing and to manage manual data fixing. Here we describe the detailes of this strutcture and the features of the scripts.

1. Real time data

This section want to explain how the real time data was managed. In particular how it was checked, collected, and alert when data errors occouring. To do that we developed the script DQC_Hourly_Linux_v6.R that runs in a cronjob every hour. It is used for urgent problems. To decide when a notice is needed this script is paired with DQC_Reset_Mail_Status.R .

TO COMPLETE

Real time pics

  1. Manage and check pics. The script DQC_Pics.R collect and organize pics coming from the stations to the storage, highlighting possible corruption to prevent wrong pubblication of bad quality pictures on the website

Weekly summary

  1. Analysis of troubles occoured in the last period. This is done by the script DQC_Reports.R. It is a tool for mainenace to have an overview of the healt status of the stations and sensors, detecting anomlies and exceptional events. This script runs automatically (cronjob) every week.

Historical data

  1. Analysis and fixing of historical data. It is used to check old data and old files, to detect structure change and to highline the typical problem of the manual preprocesing. The script that do that is DQC.R and is used to prepare hystorical data.

Wrong files

  1. Clean download data folder moving in a subfolder data with wrong file names. The script DQC_Move_Wrong_Files.R detect possible IP errors analyzing the file name.

For stability reason the scripts run on a Linux virtual machine called HPCgeo01 prepared by the ICT. For the historical analysis the script was structured for an usage on a Windows machine. We are developing an user friendly interface to help the user to configure paths structure and settings.

How to install

git clone https://gitlab.inf.unibz.it/Christian.Brida/dataqualitycheckeuracalpenv.git

Repository structure

Contributors & Contacts:



bridachristian/DataQualityCheckEuracAlpEnv documentation built on Oct. 27, 2019, 5:55 p.m.