shape.alt: An Alternate EDA Graphical Summary

Description Usage Arguments Details Note Author(s) References See Also Examples

Description

Plots a simple four panel graphical distributional summary for a data set, comprising a histogram, a cumulative normal percentage probability (CPP) plot, an empirical cumulative distribution function (ECDF), and a log-log concentration-number (C-N) plot for multifractality. Optionally the EDA graphics may be plotted with logarithmic (base 10) scaling, in which case all four plots have identical x-axis scaling.

Usage

1
2
shape.alt(xx, xlab = deparse(substitute(xx)), log = FALSE, xlim = NULL,
        nclass = NULL, ifnright = TRUE, ifrev = FALSE, colr = 8, ...)

Arguments

xx

name of the variable to be plotted.

xlab

by default the character string for xx is used for the x-axis plot titles. An alternate title can be displayed with xlab = "text string", see Examples.

log

to display the data with logarithmic (x-axis) scaling, set log = TRUE.

xlim

is determined by gx.hist and used to ensure all four panels in this function have the same x-axis scaling. xlim may be defined, see Note below.

nclass

the default procedure for preparing the histogram depends on sample size. Where N <= 500 the Scott (1979) rule is used, and when N > 500 the Freedman-Diaconis (1981) rule; both these rules are resistant to the presence of outliers, and usually provide informative histograms. Alternately, the user may define the histogram binning by setting nclass, i.e. nclass = "scott", nclass = "fd" or nclass = "sturges"; the latter being designed for normal distributions (Scott, 1992). See Venables and Ripley (2001) for details.

ifnright

controls where the sample size is plotted in the histogram display, by default this in the upper right corner of the plot. If the data distribution is such that the upper left corner would be preferable, set ifnright = FALSE. If ifnright = NULL there will be no display of the sample size.

ifrev

by default the empirical C-N function is plotted from highest value to lowest, ifrev = FALSE. As the C-N plot is a log-log display this provides greater detail for the highest values. The direction of accumulation can be key in detecting multifractal patterns, it is usually informative to also prepare a plot with ifrev = TRUE, i.e. accumulation from lowest to highest values. To see a dramatic example of this, run the Examples below.

colr

by default the histogram and Tukey boxplot, or box-and-whisker plot, are infilled in grey, colr = 8. If no infill is required, set colr = 0. See function display.lty for the range of available colours.

...

further arguments to be passed to methods. For example, the size of the axis scale annotation can be changed by setting cex.axis, the size of the axis titles by setting cex.lab, and the size of the plot title by setting cex.main. For example, if it is required to make the plot title smaller, add cex.main = 0.9 to reduce the font size by 10%. By default individual data points in the ECDF and CPP plots are marked by a plus sign, pch = 3, if a cross or open circle is desired, then set pch = 4 or pch = 1, respectively. See display.marks for all available symbols. Adding ifqs = TRUE results in horizontal and vertical dotted lines being plotted at the three central quartiles and their values, respectively, in the ECDF and CPP plots. By default the histogram is infilled in grey, colr = 8. If no infill is required, set colr = 0. See display.lty for the range of available colours.

Details

A histogram is displayed upper left, and an ECDF is displayed below it (lower left). To the right of the histogram a cumulative normal percentage probability (CPP) plot is displayed. Below it (lower right) a log-log C-N plot is displayed to highlight any multifractality in the data, which will be revealed as 'lines' of data points with different slopes. When log scaling is selected the x-axis scaling is identical in all four plots.

Note

Any less than detection limit values represented by negative values, or zeros or other numeric codes representing blanks in the data, must be removed prior to executing this function, see ltdl.fix.df.

Any NAs in the data vector are removed prior to displaying the plots.

If the default selection for xlim is inappropriate it can be set, e.g., xlim = c(0, 200) or c(2, 200), the latter being appropriate for a logarithmcally scaled plot, i.e. log = TRUE. If the defined limits lie within the observed data range truncated plots will be displayed. If this occurs the number of data points omitted is displayed below the total number of observations in the various panels.

If it is desired to prepare a display of data falling within a defined part of the actual data range, then either a data subset can be prepared externally using the appropriate R syntax, or xx may be defined in the function call as, for example, Cu[Cu < some.value] which would remove the influence of one or more outliers having values greater than some.value. In this case the number of data values displayed will be the number that are <some.value.

In some R installations the generation of multi-panel displays and the use of function eqscplot from package MASS causes warning messages related to graphics parameters to be displayed on the current device. These may be suppressed by entering options(warn = -1) on the R command line, or that line may be included in a ‘first’ function prepared by the user that loads the ‘rgr’ package, etc.

For summary statistics displays to complement the graphics see, gx.summary1, gx.summary2 and inset.

Author(s)

Robert G. Garrett

References

Venables, W.N. and Ripley, B.D., 2001. Modern Applied Statistsis with S-Plus, 3rd Edition, Springer, 501 p. See pp. 119 for a description of histogram bin selection computations.

See Also

gx.hist, cnpplt, gx.ecdf, gx.mf, remove.na, display.lty, display.marks, ltdl.fix.df, inset

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Make test data available
data(kola.o)
attach(kola.o)

## Generates an initial display to have a first look at the data and 
## decide how best to proceed
shape.alt(Cu)

## Provides a more appropriate initial display and indicates the 
## quartiles
shape.alt(Cu, xlab = "Cu (mg/kg) in <2 mm O-horizon soil", log = TRUE,
	ifqs = TRUE)

## Causes the C-N plot to be cumulated in reverse order.  This will reveal
## any multifractal properties of the data at lower concentrations
shape.alt(Cu, xlab = "Cu (mg/kg) in <2 mm O-horizon soil", log = TRUE, 
	ifrev = TRUE)

## Detach test data
detach(kola.o)

Example output

Loading required package: MASS
Loading required package: fastICA
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set
There were 20 warnings (use warnings() to see them)
Warning messages:
1: In par(old.par) : graphical parameter "cin" cannot be set
2: In par(old.par) : graphical parameter "cra" cannot be set
3: In par(old.par) : graphical parameter "csi" cannot be set
4: In par(old.par) : graphical parameter "cxy" cannot be set
5: In par(old.par) : graphical parameter "din" cannot be set
6: In par(old.par) : graphical parameter "page" cannot be set

rgr documentation built on May 2, 2019, 6:09 a.m.

Related to shape.alt in rgr...