gx.2dproj: Function to Compute and Display 2-d Projections for Data...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/gx.2dproj.R

Description

Function computes and displays 2-d projections of data matrices using either Sammon Non-linear Mapping (default), Multidimensional Scaling, Kruskal's non-metric Multidimensional Scaling (see Venables and Ripley (2001) and Cox and Cox (2001)). The original S-Plus implementation also computed the Minimum Spanning Tree plane projection (Friedman and Rafsky, 1981) as it was available in the Venables and Ripley MASS library for S-Plus. However, the R implememntation of the MASS library does not include Minimum Spanning Trees. In the R implementation, Projection Pursuit has been added using the fastICA procedure of Hyvarinen and Oja (2000). Provision is made to optionally trim individuals (rows) from the input data matrix.

Usage

1
2
gx.2dproj(xx, proc = "sam", ifilr = FALSE, log = FALSE, rsnd = FALSE, snd = FALSE,
	range = FALSE, main = "", setseed = FALSE, row.omits = NULL, ...)

Arguments

xx

then by p matrix for which the 2-d projection is required.

proc

the 2-d projection procedure required, the default is proc = "sam" for Sammon Non-Linear Mapping. For Classic (metric) Multidimensional Scaling use proc = "mds", for Kruskal's non-metric Multidimensional Scaling use "iso", and for Projection Pursuit use "ica".

ifilr

optional isometric log-ratio transformation, the default is no transformation. Recommended for closed compositionl, geochemical, data, when ifilr = TRUE all other transformations are ignored.

log

optional (natural) log transformation of the data, the default is no log transformation. For a log transformation set log = TRUE.

rsnd

optional robust normalization of the data with matrix column medians and MADs, the default is no transformation. For a robust normalization set rsnd = TRUE.

snd

optional normalization of the data with matrix column means and standard deviations, the default is no transformation. For a normalization set snd = TRUE. If rsnd = TRUE, then snd will be set to FALSE.

range

optional range transformation for the matrix columns, the data values being scaled to between zero and one for, respectively, the minimum and maximum column values. If the data are range transformed, other normalization transformation requests will be ignored.

main

an alternative plot title, see Details below.

row.omits

permits rows, individuals, to be trimmed from the input matrix, the default row.omits = NULL is for no trimming. To trim individuals enter their row numbers as a concatenated string, e.g. row.omits = c(13,15,16). The list may be extended by adding additional row numbers so as to display the 2-d structure of the remaining core data and whether further multivariate outliers are present.

setseed

sets the random number seed for fastICA so that all runs result in the same projection, and that projection is generally similar to the Sammon projection on the ilr transformed Howarth - Sinding-Larsen data set.

...

further arguments to be passed to methods concerning the generated plots. For example, if smaller plotting characters are required, specify cex = 0.8; or if some colour other than black is required for the plotting characters, specify col = 2 to obtain red (see display.lty for the default colour palette). If it is required to make the plot title smaller, add cex.main = 0.9 to reduce the font size by 10%.

Details

If main is undefined a default plot title is generated by appending the input matrix name to the text string "2-d Projection for: ". If no plot title is required set main = " ", or if a user defined plot title is required it should be defined in main, e.g., main = "Plot Title Text".

Firstly, it is strongly recommended that if the input data matrix is for data from a closed compositional, geochemical, data matrix that an ilr transform be applied to the data, ifilr = TRUE. This has the effect of reducing the dimension of the data matrix from p to (p-1). Otherwise, it is desirable to normalize, centre and scale, or undertake a range transformation on the data to ensure the variables have equal ‘weight’ in the projections. If no transformation is requested a warning message is displayed.

The x- and y-axis labels are set appropriately to indicated the type of 2-d projection in the display.

A measure of the ‘stress’ in generating the 2-d projection is estimated and displayed, low stress indicates the projection faithfully represents the relative ‘positions’ of the data in the original p-space.

Value

The following are returned as an object to be saved for further use:

main

the plot title.

input

a text string containing the name of the n by p matrix containing the data, and a list of the row numbers of any individuals trimmed, if none are trimmed the entry is NULL.

usage

The projection option selected, and the values, TRUE or FALSE, for the ilr, log, robust normalization, normalization, and range transformation options.

xlab

the 2-d projection x-axis label.

ylab

the 2-d projection y-axis label.

matnames

the individal, sample, row identifiers and the names of the input variables. If there are no individual, sample, row identifiers then row numbers are used. If an ilr transform has been used the variable names will be the (p-1) synthetic ilr variable names. If a trim has been executed only the row identifiers for the remaining data are stored.

row.numbers

the row numbers of the individuals, samples, remaining after a trim. If a trim has been executed only the row numbers for the remaining data are stored.

x

the n x-axis values for the 2-d projection.

y

the n y-axis values for the 2-d projection.

stress

the estimated stress of fitting 2-d projection to the p-space data.

Note

Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df.

Any rows in the data matrix with with NAs are removed prior to computing the 2-d projection. In the instance of an ilr transformation NAs have to be removed prior to undertaking the transformation, see remove.na.

The results of repeated executions of the ‘fastICA’ implementation of Projection Pursuit lead to various mirror images of one another unless set.seed is used to ensure each execution commences with the same seed.

This function requires that packages MASS (Venables and Ripley) and fastICA (Marchini, Heaton and Ripley) both be available.

Author(s)

Robert G. Garrett

References

Cox, T.F. and Cox, M.A.A., 2001. Multidimensional Scaling. Chapman and Hall, 308 p.

Friedman, J.H. and Rafsky, L.C., 1981. Graphics for the multivariate two-sample problem. Journal of the American Statistical Association, 76(374):277-291.

Hyvarinen, A. and Oja, E., 2000. Independent Component Analysis: Algorithms and Applications. Neural Networks, 13(4-5):411-430.

Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.

Venables, W.N. and Ripley, B.D., 2001. Modern Applied Statistics with S-Plus, 3rd Edition. Springer, 501 p.

See Also

ltdl.fix.df, remove.na, gx.2dproj.plot, sammon, cmdscale, isoMDS, fastICA, set.seed

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Make test data available
data(sind.mat2open)

## Display default, Sammon non-linear map, 2-d projection
sind.2dproj <- gx.2dproj(sind.mat2open, ifilr = TRUE)

## Display saved object identifying input matrix row numbers (cex = 0.7),
## and with an alternate main title (cex.main = 0.8) 
gx.2dproj.plot(sind.2dproj, rowids = TRUE, cex = 0.7, cex.main = 0.8,
	main = "Howarth & Sinding-Larsen\nStream Sediment ilr Transformed Data")

## Display Kruskal's non-metric multidimensional scaling 2-d projection
sind.2dproj <- gx.2dproj(sind.mat2open, proc = "iso", ifilr = TRUE)

## Display saved object identifying input matrix row numbers (cex = 0.7),
## and with an alternate main title (cex.main = 0.8) 
gx.2dproj.plot(sind.2dproj, rowids = FALSE, cex = 0.7, cex.main = 0.8, 
	main = "Howarth & Sinding-Larsen\nStream Sediment ilr Transformed Data")

## Display default, Sammon non-linear map, 2-d projection, removing the three
## most extreme individuuals
sind.2dproj.trim3 <- gx.2dproj(sind.mat2open, ifilr = TRUE, row.omits = c(13,15,16))

## Clean-up
rm(sind.2dproj)
rm(sind.2dproj.trim3)

Example output

Loading required package: MASS
Loading required package: fastICA
  ** Are the data/parts all in the same measurement units? **
  Data have been isometrically log-ratiod
Initial stress        : 0.06921
stress after   0 iters: 0.06921
  'sam' stress = 0.069213 
  ** Are the data/parts all in the same measurement units? **
  Data have been isometrically log-ratiod
initial  value 9.641394 
iter   5 value 6.265570
final  value 6.169898 
converged
  'iso' stress = 0.11201 
  The following rows have been removed from the input matrix:
   13 15 16 
  ** Are the data/parts all in the same measurement units? **
  Data have been isometrically log-ratiod
Initial stress        : 0.10144
stress after  10 iters: 0.03227, magic = 0.500
stress after  20 iters: 0.03198, magic = 0.500
stress after  30 iters: 0.03195, magic = 0.500
  'sam' stress = 0.030292 

rgr documentation built on May 2, 2019, 6:09 a.m.

Related to gx.2dproj in rgr...