knitr::opts_chunk$set(echo = TRUE)

HydroCODA

The R package aims to analyse underlying hydrogeological processes governing the hydrochemistry using the Compositional Data approach proposed by @Aitchison1982 prior a Multistatistical Analysis. It is a tool to perform water samples clasification by a Hierarchical Cluster Analysis - HCA and Principal Component Analysis - PCA. Finally, it creates the classical Stiff Diagram @Stiff1951 for each cluster and draws an interactive map with the location, description and cluster clasification. CoDA techniques have been well developed in the R-package "compositions" @VanDenBoogaart2015, then HydroCODA R-package is an implementation to enhance the hydrogeochemical analysis.

Installation

Currently, you can install the version under development from Github, using these commands:

install.packages("devtools")
devtools::install_github("AdrianaPina/HydroCODA")

Compositional Data Analysis CoDA for the treatment of hydrogeochemical water samples

Water sampling provides information about the state of a system. In hydrogeology, water composition allows to study the provenance of grundwater according to the mineral interaction with the host rock, to define recharge, flowpaths and discharge zones, to identify aquifer pollution due to natural or antrophic conditions, among others. As result, large databases are built based on the number of samples and, the measured chemical compounds, wich are at least the major ions (Ca^{+2}, Mg^{+2}, K^{+}, Na^{+}, Cl^{-}, HCO_3^{-}, NO_3^{-}, NO_2^{-} and SO_4^{2-}). To overcome the integration, interpretation and representation of the results, a very common practice is the use of multivariate statistical techniques @Cloutier2008 that require data standarization and transformation.

However, hydrogeochemical data from a single sample consist of a series of measurements of analytes, commonly expressed in proportions such as mg/L, ppm or ppb @Blake2016, which charactertize the compositional data (the fact that the determinations on each specimen sum to a constant) (@Aitchison1982). As consequence, there is a misinterpretation of closed data when treated with “normal” statistical methods and the usual multivariate statistical techniques are not applicable to constrained data (@Aitchison2003).

Then, the incorporation of CoDA techniques greatly improved traditional statistical techniques in particular the PCA and the HCA, allows for the identification of underlying hydrogeological processes governing the hydrochemistry (@Blake2016, @Pina2018). With this approach data are fully taken into account, enhancing their relative multivariate behaviour in the correct sample space (@Buccianti2015).

Based on the R-package "compositions" @VanDenBoogaart2015, raw hydrochemical data is treated as compositional data, then a CLR transformation (@Aitchison2003) is implemented to perform the HCA; it allows the classification of water samples (cluster). Then, water samples and the assigned cluster are located on a map to visualize the results acording to the geology and faults distribution. There is an option to built a representative Stiff diagram (@Stiff1951) for each cluster, providing information about the hydrogeological units. Finally, a compositional PCA could be performed, obtaining the loads for each component and building the compositional covariance biplot. This kind of plots are useful as exploratory and expository tool, nevertheless, the fundamental elements of a compositional biplot are the links (the join of two or more rays), not the rays as in the case of variation (@Aitchison2003). There are some properties for the interpretation of the compositional variability for the analysis (@Blake2016):

1) If two vertices are coincident or situated close to each other, they are proportional; 2) The length of a link between two vertices is proportional to the log ratio of those two variables; 3) If three or more vertices lie on the same link, they may represent a sub-composition with one single degree of freedom; 4) If two links between four separate clr-variables are orthogonal, then the corresponding pairs of variables may vary independently of each other (this also applies for two orthogonal links describing sub-compositions).

Disclaimer

HydroCODA is a public R library that is made freely available by the voluntary work of the researchers/authors at the Universidad Nacional de Colombia (UNAL), hereafter call as creators, so as to promote the study of hydrogeological units by using statistics and hydrogeochemistry tools.

The representations of the physical world within the software are widely known. The codification and use of them are offered through this R library as a public service and are no cause of action against the creators. The user of this software/information is responsible for verifying the suitability, accuracy, completeness and quality for the particular use of it and hence the user asumes all liability and waives any claims or actions against the creators. Creators do not make any claim, guarantee or warranty the, expressed or implied, suitability, accuracy, completeness and quality for the particular use of the library. The creators disclaim any and all liability for any claims or damages that may result from the application of the information/software contained in the library. The information/software is provided as a guide.

Regarding other information contained in the library. The links or information that are accessed through external sites, which are not maintained by the creators, do not make the creators responsible for that content or the any claims or damages that may result from the use of these external sites. Information within this library is considered to be accurate with the considerations of uncertainties associated with hydrological modelling.

References



AdrianaPina/HydroCODA documentation built on Feb. 1, 2023, 5:43 a.m.