transformGcData: Transform the geochemical data

View source: R/GcClusterFunctions.R

transformGcDataR Documentation

Transform the geochemical data

Description

Transform the geochemical data first with the isometric log-ratio transform and then with the robust principal component transform.

Usage

transformGcData(gcData, alpha = 0.98)

Arguments

gcData

List containing the geochemical and related data. This container is described in the package documentation.

alpha

Fraction of the data used for the robust principal component transform (See Details).

Details

The elements for which chemical concentrations are listed in gcData are a subset of all elements. For example, the elements might include aluminum and arsenic but might not include silver and plutonium. If the concentrations of these missing elements are accounted for, then the some results of the clustering are easier to interpret. Consequently, the concentrations of the missing elements are collectively represented by an amalgamated concentration. For example, if the sum of the reported concentrations for one field sample is 97,634 mg/kg, then the amalgamated concentration is 1,000,000 - 97,634 mg/kg. Amalgamated concentrations are calculated for all field samples and are appended to the chemical concentrations from gcData under the column heading "EE", which means "everything else."

The chemical concentrations are transformed twice: First, the concentrations are transformed to isometric log-ratio (ilr) coordinates. This transformation is described in Pawlowsky-Glahn et al. (2015, p. 36-38). Second, the ilr coordinates are transformed to robust principal coordinates. This transformation is described in Filzmoser et al. (2009). The transformation requires a mean vector and a covariance matrix; robust values for these two statistics are calculated using function covMcd from package robustbase. One argument to function covMcd is alpha, which is the fraction of the ilr-transformed coordinates that are used to calculate the two statistics.

Value

A list with seven elements is returned. The elements are vectors and matrices for which the dimensions depend on two quantities: The number of field samples, N, and the number of geochemical concentrations reported for each field sample, D. (D includes the amalgamated concentration for "EE".)

Psi

Contrast matrix that is used for the ilr transformation. The matrix dimensions are (D-1) x D.

ilrCoefs

Matrix of ilr coefficients (coordinates) resulting from ilr transformation of the geochemical concentrations. The matrix dimensions are N x (D-1).

robustIlrCenter

Vector containing the robust mean of the ilr coordinates. The vector dimension is D-1.

robustEigenvectors

Matrix containing the eigenvectors of the robust covariance matrix for the ilr coordinates. The matrix dimension is (D-1) x (D-1).

robustEigenvalues

Vector containing the eigenvalues of the robust covariance matrix for the ilr coordinates. The vector dimension is D-1.

robustPCs

Matrix containing the robust principal components. The matrix dimension is N x (D-1).

alpha

Scalar containing the input argument alpha.

References

Aitchison, J., 1986, Statistical analysis of compositional data: Chapman and Hall, Inc., Boca Raton, Florida, U.S.A. re-issued in 2003 by The Blackburn Press, Caldwell, New Jersey, U.S.A.

Filzmoser, P., Hron, K., and Reimann, C., 2009, Principal component analysis for compositional data with outliers: Environmetrics, v. 20, p. 621-632.

Pawlowsky-Glahn, V., Egozcue, J.J., and Tolosana-Delgado, R., 2015, Modeling and analysis of compositional data: John Wiley and Sons, Ltd.

Examples

## Not run: 
transData <- gcTransform(X)

## End(Not run)


USGS-R/GcClust documentation built on April 17, 2023, 8:08 p.m.