fences.summary: Generate and Save Fence Values for Data Subsets

Description Usage Arguments Details Note Author(s) References See Also Examples

Description

Function to generate fences and save the displayed table of values, by default in the R Working Directory, for subsets of the data for a variable when the data can be subdivided by some criterion (factor) such as EcoRegion, Province, physical sample parent material, etc. The function supports the selection of the upper and lower bounds of background variability, and threshold(s) or action levels, when obvious graphical solutions are not visually recognizable.

Usage

1

Arguments

group

the name of the factor variable by which the data are to be subset.

x

name of the variable to be processed.

file

the name of the folder where the file is to be saved, a ‘/’ is appended before the synthesized file name, details of which are displayed on the current device. If no file is specified, the files are saved to the Working Directory, see Details below.

units

the units of measurement, options are: “pct”, “ppm”, “ppb”, “ppt”. The default is “ppm”.

Details

The fence values are computed by several procedures both with and without a logarithic data transformation and with a logistic transformation, together with the 98th percentile of the data for display. These computations are based on results returned from function gx.stats. Fences are computed following Tukey's boxplot procedure, as median +/- 2 * MAD (Median Absolute Deviation), and mean +/- 2 * SD (Standard Deviation), see Reimann et al. (2005). It is essential that these estimates are viewed in the context of the graphical distributional displays, e.g., shape and its graphical components, gx.hist, gx.ecdf, cnpplt and bxplot, and if spatial coordinates for the sample sites are available map.eda7, map.eda8 and caplot. The final selection of a range for background or the selection of a threshold level needs to take the statistical and spatial distributions of the data into account. It is also necessary to be aware that it might be appropriate to have more than one background range/threshold in an area (Reimann and Garrett, 2005). The presence of relevant information in the data frame may permit the data to be subset on the basis of that information for display with the tbplots, bwplots and gx.cnpplts functions. If these indicate that the medians and middle 50%s of the data are visibly different, multiple background ranges may be advisable.

A default file name is generated by concatenating the data frame name (see Note below), group and variable, x, names, separated by _s and terminating in _fences.txt. If file contains text it is used as the first part of the file name identifying the data source for the file to be saved in the specified folder, for example, file = "D://R_work//Project3", to which the synthesized file name is concatenated. Otherwise the file is saved in the Working Directory.

Output to the current device is suppressed. The output file is formatted as a tab delimited file to be read with a spread sheet program. It can be inspected with a text viewer, and column spacings edited for cosmetic purposes with an ASCII editor of the user's choice.

Note

The synthesis of the file name uses the data frame name which it is assumed is located in search() position [[2]].

The logit transformation requires that the input value be in the range zero to one. This transformation takes into consideration the closed, constant sum, nature of geochemical analytical data (Filzmoser et al., 2009). Therefore the measurement units must be defined so that the the value can be divided by the appropriate constant. The default is “ppm”, and other acceptable units are “pct”, “ppb” and “ppt”. However, it should be noted that at trace element levels the differences between fences computed with logarithmic and logit transformations are small, and in most applied geochemical applications the logarithmic transformation will suffice. This is not the case for concentrations at major element levels, where the data are more ‘normally’ distributed and fences will be markedly different between untransformed and logit based estimates.

Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df.

Any NAs in the data vector are removed prior to computing the fences.

The function fences is employed to compute the statistical fence estimates.

Author(s)

Robert G. Garrett

References

Filzmoser, P., Hron, K. and Reimann, C., 2009. Univariate statistical analysis of environmental (compositional) data: Problems and possibilities. Science of the Total Environment, 407(1/3):6100-6108.

Reimann, C. and Garrett, R.G., 2005. Geochemical background - Concept and reality. Science of the Total Environment, 350(1-3):12-27.

Reimann, C., Filzmoser, P. and Garrett, R.G., 2005. Background and threshold: critical comparison of methods of determination. Science of the Total Environment, 346(1-3):1-16.

Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.

See Also

fences, ltdl.fix.df, remove.na

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Make test data available
data(kola.c)
attach(kola.c)

## Saves the file kola_c_COUNTRY_Cu_fences.txt for later use
## in the R Working Directory.
fences.summary(COUNTRY, Cu)

## Detach test data 
detach(kola.c)

rgr documentation built on May 2, 2019, 6:09 a.m.

Related to fences.summary in rgr...