Test projects

The three following case-studies are tested in detail within FAKIN (i.e. proposed best-practices will be applied for this case studies and cross-checked whether their application is useful.

Geogenic Salination

Two types of datasets are handled in the Geosalz project, spread sheets of mostly hydrochemical laboratory analysis and archived paper files. Accordingly, the raw data and processing folders are divided in two parts labor and archive.

Data processing

Folder structure

//server/rawdata
  geosalz
    BWB_archiv
    BWB_labor 
    README.yml


//server/processing
  geosalz
    archiv
    labor 
      README.yml 
      documents/
      precleaned-data/
      cleaned-data/
      figures/
      <rawdata.ink> (link to “//server/rawdata/geosalz/BWB_labor”)
      <results.ink> (link to “//server/results/geosalz/report/”)

//server/results
  geosalz
    admin
    reports
      final_report.docx
      <processing.ink> (link to “//server/processing/geosalz/labor/cleaned_data/v1.0”)
      README.yml 

Folder names indicate the owner of the data, here BWB. The README.yml gives information on licensing of the data. In this case for restricted use only. The BWB_labor folder contains:

The METADATA.yml comprises information on the origin of each file. In this case data was received by email, thus each email is exported as a txt file (select - export as…) and copied to the METADATA.yml. The METADATA.yml also contains the email text itself, which may also provide meta information. The METADATA.yml makes clear when and from whom the data was send and who received it. A hyperlink can be inserted that directly links to the corresponding processing folder. No further subfolders are required.

Workflow for creating above folder structure

  1. Define project acronym geosalz (add in PROJECTS.yml)

  2. Create initial folder structure on //servername/rawdata/geosalz

  3. Create initial folder structure on //servername/processing/geosalz with one subfolder for each task/work package, i.e.

  4. Create a README.yml for each task describing the folder contents

  5. Link relevant results to //server/projekte/AUFTRAEGE/Rahmenvertrag GRW-WV/Data and documents/Versalzung

Lessons learnt:

LCA Modelling

Challenge:

The LCA modelling software Umberto can produce large raw data output files (> 300 MB csv files) that sometimes are even to big for EXCEL 2010 (> 1 millions) but need to be aggregated (e.g. grouped by specific criteria). This was usually performed manually within EXCEL in case that model output data was below EXCEL`s 1 million row limit.

Workflow improvement developed within FAKIN:

An open source R package kwb.umberto was programmed for automating:

This results.xlsx EXCEL spreadsheet is referenced by another EXCEL spreatsheet figues.xlsx (which contains the figure templates and just links to the results.xlsx in order to update the predefined figures).

This workflow now reduces the time consuming and error-prone formerly manually performed data aggregation in EXCEL, whilst still enabling the users to adapt the figures to their needs without coding knowledge.

Pilot Plants {#aquanes}

Challenges:

The output of (on-line) monitoring technologies is often difficult to interpret and also inconvenient to handle as the output formats of different devices (in one water treatment scheme) can vary strongly. Furthermore, frequent reporting and documentation of the treatment performance via (on-line) monitoring can be time consuming for the personnel and requires advanced software solutions. An alternative to commercial (and often expensive) software solutions are tools which are based on the open software R r citep(citation()). The free software approach allows any R programmer to produce customized tools for each individual end-user.

Thus an automated reporting tool is developed within the AQUANES project for enabling an integrative assessment of the different monitoring devices and integration with water quality data obtained from analysis in the laboratory for four different pilot plant sites in order to:

Therefore the open source R package aquanes.report r citep(manual["Rustler_2018"]) was programmed, which is able to:

For the four different pilot plant sites the data (operational and lab data) for being imported into the R tool came from various sources at different temporal resolutions, which are detailed below:

{block2 type="rmdwarning"} The high temporal high resolution (~ seconds) of the operational data for both Berlin pilot plants resulted in large data amounts (~ 10 million data points per month), which required an large effort to optimise the performance of the R reporting tool in oorder to enable the visualisation of the pilot plant`s raw data for its test operation period (~ 18 months) on computers with limited RAM ressources (~ 8-12 GB).

The R tool is used by KWB (for the sites Berlin-Schönerlinde and Berlin-Tiefwerder) regulary for assessing the pilot plants` operational performance interactively. In addition for an advanced assessment only the data importing and aggregation routines and combined with R scripts developed by KWB students.

For the other two pilot plant sites Haridwar and Basel Lange-Erlen the AQUANES project partners use the automated R reporting tool in a similar way.



KWB-R/fakin.doc documentation built on Sept. 27, 2019, 9:53 p.m.