gx.md.gait: Function for Multivariate Graphical Adaptive Interactive...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Function to undertake the GAIT (Graphical Adaptive Interactive Trimming) procedure for multivariate distributions through Chi-square plots of Mahalanobis distances (MDs) as described in Garrett (1988, 1989). For closed compositional, geochemical, data sets use gx.md.gait.closed. To carry out GAIT the function is called repeatedly with the weights from the previous iteration being used as a starting point. Either a percentage based MVT or a MCD robust start may be used as the first iteration.

Usage

1
2
3
gx.md.gait(xx, wts = NULL, trim = -1, mvtstart = FALSE,
	mcdstart = FALSE, main = deparse(substitute(xx)),
	ifadd = c(0.98, 0.95, 0.9), cexf = 0.6, cex = 0.8, ...)

Arguments

xx

the n by p matrix for which the Mahalanobis distances are required.

wts

the vector of weights for the n individuals, either zero or 1.

trim

the desired trim: trim < 0 - no trim, the default; trim >0 & <1 - the fraction, 0 to 1 proportion, of individuals to be trimmed; trim >= 1 - the number of individuals with the highest MDs from the previous iteration to trim.

mvtstart

set mvtstart = TRUE for a percentage based MVT (multivariate trim) start.

mcdstart

set mcdstart = TRUE for a minimum covariance determinant (mcd) robust start.

main

an alternative plot title to the default input data matrix name, see Details below.

ifadd

if probability based fences are to be displayed on the Chi-square plots enter the probabilities here, see Details below. For no fences set ifadd = NULL.

cexf

the scale expansion factor for the Ch-square fence annotation, by default cexf = 0.6.

cex

the scale expansion factor for the symbols and text annotation within the ‘frame’ of the Chi-square plot, by default cex = 0.8.

...

further arguments to be passed to methods concerning the generated plots. For example, if some colour other than black is required for the plotting characters, specify col = 2 to obtain red (see display.lty for the default colour palette). If it is required to make the plot title or axis labelling smaller, add cex.main = 0.9 or cex.lab = 0.9, respectively, to reduce the font size by 10%.

Details

If main is undefined the name of the matrix object passed to the function is used as the plot title. This is the recommended procedure as it helps to track the progression of the GAIT. Alternate plot titles can be defined if the final saved object is passed to gx.md.plot. If no plot title is required set main = " ", or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text".

By default three fences are placed on the Chi-square plots at probabilities of membership of the current ‘core’ data subset, or total data if appropriate, with ifadd = c(0.98, 0.95, 0.9). Alternate probabilities may be defined as best for the display. If no fences are required set ifadd = NULL.

The Mahalanobis distance, Chi-square, plot x-axis label is set appropriately to indicated the type of robust start or trim using the value of proc.

Value

The following are returned as an object to be saved for the next iteration or final use:

main

by default (recommended) the input data matrix name.

input

the data matrix name, input = deparse(substitute(xx)), retained to be used by post-processing display functions.

matnames

the row numbers and column headings of the input matrix.

proc

the procedure followed for this iteration, used for subsequent Chi-sqaure plot x-axis labelling.

wts

the vector of weights for the n individuals, either zero or 1.

n

the total number of individuals (observations, cases or samples) in the input data matrix.

ptrim

the percentage, as a fraction, of samples called to be trimmed in this iteration, otherwise ptrim = -1.

mean

the length p vector of means for the ‘core’ data following the current GAIT step.

cov

the p x p covariance matrix for the ‘core’ data following the current GAIT step.

sd

the length p vector of standard deviations for the ‘core’ data following the current GAIT step.

md

the vector of Mahalanobis distances for all the n individuals following the current GAIT step.

ppm

the vector of predicted probabilities of membership for all the n individuals following the current GAIT step.

Note

Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df.

Any rows in the data matrix with NAs are removed prior to computations. In the instance of a log-ratio, e.g., ilr, transformation NAs are removed.

Warnings are generated when the number of individuals (observations, cases or samples) falls below 5p, and additional warnings when the number of individuals falls below 3p. At these low ratios of individuals to variables the shape of the p-space hyperellipsoid is difficult to reliably define, and therefore the results may lack stability. These limits 5p and 3p are generous, the latter especially so; many statisticians would argue that the number of individuals should not fall below 9p, see Garrett (1993).

Author(s)

Robert G. Garrett

References

Garrett, R.G., 1988. IDEAS - An interactive computer graphics tool to assist the exploration geochemist. In Current Research Part F, Geological Survey of Canada Paper 88-1F, pp. 1-13.

Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14.

Garrett, R.G., 1989. The Chi-square plot - a tool for multivariate outlier recognition. In Proc. 12th International Geochemical Exploration Symposium, Geochemical Exploration 1987 (Ed. S. Jenness). Journal of Geochemical Exploration, 32(1/3):319-341.

See Also

ltdl.fix.df, remove.na, gx.md.plot, gx.md.print

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Note, the example below is presented for historical continuity.  It is 
## not recommended that this procedure be used for geochemical data.  For
## such data function gx.md.gait.closed should be employed.  However, the
## example below is consistent with Garrett (1989).
## Make test data available
data(sind)
attach(sind)
sind.mat <- as.matrix(sind[, -c(1:3)])

## Undertake original published GAIT procedure
gx.md.gait(sind.mat)
sind.gait.1 <- gx.md.gait(sind.mat, trim = 0.24, ifadd = 0.98) 
sind.gait.2 <- gx.md.gait(sind.mat, wts = sind.gait.1$wts, mvtstart = TRUE,
trim = 4, ifadd = 0.98)
sind.gait.3 <- gx.md.gait(sind.mat, wts = sind.gait.2$wts, trim = 1,
ifadd = 0.9)
sind.gait.4 <- gx.md.gait(sind.mat, wts = sind.gait.3$wts, trim = 2,
ifadd = 0.9)

## Display saved object with alternate main titles and list outliers
## IDEAS procedure
gx.md.plot(sind.gait.4,
main = "Howarth & Sinding-Larsen\nStream Sediments, IDEAS procedure",
cex.main = 0.8, ifadd = 0.9)
gx.md.print(sind.gait.4, pcut = 0.2)

## Clean-up and detach test data
rm(sind.mat)
rm(sind.gait.1)
rm(sind.gait.2)
rm(sind.gait.3)
rm(sind.gait.4)
detach(sind)

Example output

Loading required package: MASS
Loading required package: fastICA
  *** Proceed with Care, n < 5p ***
  *** Proceed with Care, ncore < 5p ***
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set
  *** Proceed with Care, ncore < 5p ***
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set
  *** Proceed with Care, ncore < 5p ***
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set
  *** Proceed with Care, ncore < 5p ***
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set
  *** Proceed with Care, Core Size < 5p ***
  *** Proceed with Great Care, Core Size < 3p ***
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set
  Mahalanobis Distances for sind.gait.4 
  Source data matrix: sind.mat 

  Table of Mahalanobis Distances where probabilities of group membership (p_gm) are <0.2 

   ID		  MD     p_gm

   15 		 12500 	 3.33e-16 
   16 		 11800 	 4.44e-16 
   13 		 4480 	 1.57e-13 
   23 		 1060 	 7.96e-10 
   22 		 97.3 	 0.000394 
   12 		 45 	 0.0107 
   10 		 41.6 	 0.0142 

rgr documentation built on May 2, 2019, 6:09 a.m.

Related to gx.md.gait in rgr...