1D Scatter Plots with Confidence Intervals
Description
stripChart
is a modification of the R function stripchart
.
It is a generic function used to produce one dimensional scatter
plots (or dot plots) of the given data, along with text indicating sample size and
estimates of location (mean or median) and scale (standard deviation
or interquartile range), as well as confidence intervals for the population
location parameter.
One dimensional scatterplots are a good alternative to boxplots
when sample sizes are small or moderate. The function invokes particular
methods
which depend on the class
of the first argument.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  stripChart(x, ...)
## S3 method for class 'formula'
stripChart(x, data = NULL, dlab = NULL,
subset, na.action = NULL, ...)
## Default S3 method:
stripChart(x,
method = ifelse(paired && paired.lines, "overplot", "stack"),
seed = 47, jitter = 0.1 * cex, offset = 1/2, vertical = TRUE,
group.names, group.names.cex = cex, drop.unused.levels = TRUE,
add = FALSE, at = NULL, xlim = NULL, ylim = NULL, ylab = NULL,
xlab = NULL, dlab = "", glab = "", log = "", pch = 1, col = par("fg"),
cex = par("cex"), points.cex = cex, axes = TRUE, frame.plot = axes,
show.ci = TRUE, location.pch = 16, location.cex = cex,
conf.level = 0.95, min.n.for.ci = 2,
ci.offset = 3/ifelse(n > 2, (n1)^(1/3), 1), ci.bar.lwd = cex,
ci.bar.ends = TRUE, ci.bar.ends.size = 0.5 * cex, ci.bar.gap = FALSE,
n.text = "bottom", n.text.line = ifelse(n.text == "bottom", 2, 0),
n.text.cex = cex, location.scale.text = "top",
location.scale.digits = 1, nsmall = location.scale.digits,
location.scale.text.line = ifelse(location.scale.text == "top", 0, 3.5),
location.scale.text.cex =
cex * 0.8 * ifelse(n > 6, max(0.4, 1  (n6) * 0.06), 1),
p.value = FALSE, p.value.digits = 3, p.value.line = 2, p.value.cex = cex,
group.difference.ci = p.value, group.difference.conf.level = 0.95,
group.difference.digits = location.scale.digits,
ci.and.test = "parametric", ci.arg.list = NULL, test.arg.list = NULL,
alternative = "two.sided", plot.diff = FALSE, diff.col = col[1],
diff.method = "stack", diff.pch = pch[1], paired = FALSE, paired.lines = paired,
paired.lty = 1:6, paired.lwd = 1, paired.pch = 1:14, paired.col = NULL,
diff.name = NULL, diff.name.cex = group.names.cex, sep.line = TRUE,
sep.lty = 2, sep.lwd = cex, sep.col = "gray", diff.lim = NULL,
diff.at = NULL, diff.axis.label = NULL,
plot.diff.mar = c(5, 4, 4, 4) + 0.1, ...)

Arguments
x 
the data from which the plots are to be produced. In the default method the data can be
specified as a list or data frame where each component is numeric, a numeric matrix,
or a numeric vector. In the formula method, a symbolic specification of the form

data 
for the formula method, a data.frame (or list) from which the variables in 
subset 
for the formula method, an optional vector specifying a subset of observations to be used for plotting. 
na.action 
for the formula method, a function which indicates what should happen when the data
contain 
... 
additional parameters passed to the default method, or by it to 
method 
the method to be used to separate coincident points. When 
seed 
when 
jitter 
when 
offset 
when stacking is used, points are stacked this many lineheights (symbol widths) apart. 
vertical 
when 
group.names 
group labels which will be printed alongside (or underneath) each plot. 
group.names.cex 
numeric scalar indicating the amount by which the group labels should be scaled
relative to the default (see the help file for 
drop.unused.levels 
when 
add 
logical, if true add the chart to the current plot. 
at 
numeric vector giving the locations where the charts should be drawn,
particularly when 
xlim, ylim 
plot limits: see 
ylab, xlab 
labels: see 
dlab, glab 
alternate way to specify axis labels. The 
log 
on which axes to use a log scale: see 
pch, col, cex 
Graphical parameters: see 
points.cex 
Sets the 
axes, frame.plot 
Axis control: see 
show.ci 
logical scalar indicating whether to plot the confidence interval. The default is

location.pch 
integer indicating which plotting character to use to indicate the estimate of location
(mean or median) for each group (see the help file for 
location.cex 
numeric scalar giving the amount by which the plotting characters indicating the
estimate of location for each group should be scaled relative to the default
(see the help file for 
conf.level 
numeric scalar between 0 and 1 indicating the confidence level associated with the
confidence interval for the group location (population mean or median).
The default value is 
min.n.for.ci 
integer indicating the minimum sample size required in order to plot a confidence interval
for the group location. The default value is 
ci.offset 
numeric scalar or vector of length equal to the number of groups ( 
ci.bar.lwd 
numeric scalar indicating the line width for the confidence interval bars.
The default is the current value of the graphics parameter 
ci.bar.ends 
logical scalar indicating whether to add flat ends to the confidence interval bars.
The default value is 
ci.bar.ends.size 
numeric scalar in units of 
ci.bar.gap 
logical scalar indicating with to add a gap between the estimate of group location and the
confidence interval bar. The default value is 
n.text 
character string indicating whether and where to indicate the sample size for each group.
Possible values are 
n.text.line 
integer indicating on which plot margin line to show the sample sizes for each group. The
default value is 
n.text.cex 
numeric scalar giving the amount by which the text indicating the sample size for
each group should be scaled relative to the default (see the help file for 
location.scale.text 
character string indicating whether and where to indicate the estimates of location
(mean or median) and scale (standard deviation or interquartile range) for each group.
Possible values are 
location.scale.digits 
integer indicating the number of digits to round the estimates of location and scale. The
default value is 
nsmall 
integer passed to the function 
location.scale.text.line 
integer indicating on which plot margin line to show the estimates of location and scale
for each group. The default value is 
location.scale.text.cex 
numeric scalar giving the amount by which the text indicating the estimates of
location and scale for each group should be scaled relative to the default
(see the help file for 
p.value 
logical scalar indicating whether to show the pvalue associated with testing whether all groups
have the same population location. The default value is 
p.value.digits 
integer indicating the number of digits to round to when displaying the pvalue associated with
the test of equal group locations. The default value is 
p.value.line 
integer indicating on which plot margin line to show the pvalue associated with the test of
equal group locations. The default value is 
p.value.cex 
numeric scalar giving the amount by which the text indicating the pvalue associated
with the test of equal group locations should be scaled relative to the default
(see the help file for 
group.difference.ci 
for the case when there are just 2 groups, a logical scalar indicating whether to display
the confidence interval for the difference between group locations. The default is
the value of the 
group.difference.conf.level 
for the case when there are just 2 groups, a numeric scalar between 0 and 1
indicating the confidence level associated with the confidence interval for the
difference between group locations. The default is 
group.difference.digits 
for the case when there are just 2 groups, an integer indicating the number of digits to
round to when displaying the confidence interval for the difference between group locations.
The default value is 
ci.and.test 
character string indicating whether confidence intervals and tests should be based on parametric
or nonparametric ( 
ci.arg.list 
an optional list of arguments to pass to the function used to compute confidence intervals.
The default value is 
test.arg.list 
an optional list of arguments to pass to the function used to test for group differences in location.
The default value is 
alternative 
character string describing the alternative hypothesis for the test of group differences in the
case when there are two groups. Possible values are 
plot.diff 
applicable only to the case when there are two groups: When When 
diff.col 
applicable only to the case when there are two groups and 
diff.method 
applicable only to the case when there are two groups, 
diff.pch 
applicable only to the case when there are two groups, 
paired 
applicable only to the case when there are two groups: 
paired.lines 
applicable only to the case when there are two groups and 
paired.lty 
applicable only to the case when there are two groups, 
paired.lwd 
applicable only to the case when there are two groups, 
paired.pch 
applicable only to the case when there are two groups, 
paired.col 
applicable only to the case when there are two groups, 
diff.name 
applicable only to the case when there are two groups and 
diff.name.cex 
applicable only to the case when there are two groups and 
sep.line 
applicable only to the case when there are two groups and 
sep.lty 
applicable only to the case when there are two groups, 
sep.lwd 
applicable only to the case when there are two groups, 
sep.col 
applicable only to the case when there are two groups, 
diff.lim 
applicable only to the case when there are two groups and 
diff.at 
applicable only to the case when there are two groups and 
diff.axis.label 
applicable only to the case when there are two groups and 
plot.diff.mar 
applicable only to the case when there are two groups, 
Value
stripChart
invisibly returns a list with the following components:
group.centers 
numeric vector of values on the group axis (the xaxis unless

group.stats 
a matrix with the number of rows equal to the number of groups and six columns indicating the sample size of the group (N), the estimate of the group location parameter (Mean or Median), the estimate of the group scale (SD or IQR), the lower confidence limit for the group location parameter (LCL), the upper confidence limit for the group location parameter (UCL), and the confidence level associated with the confidence interval (Conf.Level) 
In addition, if the argument p.value=TRUE
and/or 1) there are two groups and 2) plot.diff=TRUE
,
the list also includes these components:
group.difference.p.value 
numeric scalar indicating the pvalue associated with the test of equal group locations. 
group.difference.conf.int 
numeric vector of two elements indicating the confidence interval for the difference between the group locations. Only present when there are two groups. 
Author(s)
Steven P. Millard (EnvStats@ProbStatInfo.com)
References
Hollander, M., and D.A. Wolfe. (1999). Nonparametric Statistical Methods. Second Edition. John Wiley and Sons, New York.
Millard, S.P., and N.K. Neerchal. (2001). Environmental Statistics with SPLUS. CRC Press, Boca Raton, FL.
Zar, J.H. (2010). Biostatistical Analysis. Fifth Edition. PrenticeHall, Upper Saddle River, NJ.
See Also
stripchart
, t.test
, wilcox.test
,
aov
, kruskal.test
, t.test
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161  #
# Two Independent Samples
#
# The guidance document USEPA (1994b, pp. 6.226.25)
# contains measures of 1,2,3,4Tetrachlorobenzene (TcCB)
# concentrations (in parts per billion) from soil samples
# at a Reference area and a Cleanup area. These data are strored
# in the data frame EPA.94b.tccb.df.
#
# First create onedimensional scatterplots to compare the
# TcCB concentrations between the areas and use a nonparametric
# test to test for a difference between areas.
dev.new()
stripChart(TcCB ~ Area, data = EPA.94b.tccb.df, col = c("red", "blue"),
p.value = TRUE, ci.and.test = "nonparametric",
ylab = "TcCB (ppb)")
#
# Now logtransform the TcCB data and use a parametric test
# to compare the areas.
dev.new()
stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, col = c("red", "blue"),
p.value = TRUE, ylab = "log10 [ TcCB (ppb) ]")
#
# Repeat the above procedure, but also plot the confidence interval
# for the difference between the means.
dev.new()
stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, col = c("red", "blue"),
p.value = TRUE, plot.diff = TRUE, diff.col = "black",
ylab = "log10 [ TcCB (ppb) ]")
#
# Repeat the above procedure, but allow the variances to differ.
dev.new()
stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, col = c("red", "blue"),
p.value = TRUE, plot.diff = TRUE, diff.col = "black",
ylab = "log10 [ TcCB (ppb) ]", test.arg.list = list(var.equal = FALSE))
#
# Repeat the above procedure, but jitter the points instead of
# stacking them.
dev.new()
stripChart(log10(TcCB) ~ Area, data = EPA.94b.tccb.df, col = c("red", "blue"),
p.value = TRUE, plot.diff = TRUE, diff.col = "black",
ylab = "log10 [ TcCB (ppb) ]", test.arg.list = list(var.equal = FALSE),
method = "jitter", ci.offset = 4)
#
# Clean up
#
graphics.off()
#====================
#
# Paired Observations
#
# The data frame ACE.13.TCE.df contians paired observations of
# trichloroethylene (TCE; mg/L) at 10 groundwater monitoring wells
# before and after remediation.
#
# Create onedimensional scatterplots to compare TCE concentrations
# before and after remediation and use a paired ttest to
# test for a difference between periods.
dev.new()
stripChart(TCE.mg.per.L ~ Period, data = ACE.13.TCE.df,
col = c("brown", "green"), p.value = TRUE, paired = TRUE,
ylab = "TCE (mg/L)")
#
# Repeat the above procedure, but also plot the confidence interval
# for the mean of the paired differences.
dev.new()
stripChart(TCE.mg.per.L ~ Period, data = ACE.13.TCE.df,
col = c("brown", "green"), p.value = TRUE, paired = TRUE,
ylab = "TCE (mg/L)", plot.diff = TRUE, diff.col = "blue")
#==========
# Repeat the last two examples, but use a onesided alternative since
# remediation should decrease TCE concentration.
dev.new()
stripChart(TCE.mg.per.L ~ Period, data = ACE.13.TCE.df,
col = c("brown", "green"), p.value = TRUE, paired = TRUE,
ylab = "TCE (mg/L)", alternative = "less",
group.difference.digits = 2)
#
# Repeat the above procedure, but also plot the confidence interval
# for the mean of the paired differences.
#
# NOTE: Although stripChart can *report* onesided confidence intervals
# for the difference between two groups (see above example),
# when *plotting* the confidence interval for the difference,
# only twosided CIs are allowed.
# Here, we will set the confidence level of the confidence
# interval for the mean of the paired differences to 90%,
# so that the upper bound of the CI corresponds to the upper
# bound of a 95% onesided CI.
dev.new()
stripChart(TCE.mg.per.L ~ Period, data = ACE.13.TCE.df,
col = c("brown", "green"), p.value = TRUE, paired = TRUE,
ylab = "TCE (mg/L)", group.difference.digits = 2,
plot.diff = TRUE, diff.col = "blue", group.difference.conf.level = 0.9)
#
# Clean up
#
graphics.off()
#==========
# The data frame Helsel.Hirsch.02.Mayfly.df contains paired counts
# of mayfly nymphs above and below industrial outfalls in 12 streams.
#
# Create onedimensional scatterplots to compare the
# counts between locations and use a nonparametric test
# to compare counts above and below the outfalls.
dev.new()
stripChart(Mayfly.Count ~ Location, data = Helsel.Hirsch.02.Mayfly.df,
col = c("green", "brown"), p.value = TRUE, paired = TRUE,
ci.and.test = "nonparametric", ylab = "Number of Mayfly Nymphs")
#
# Repeat the above procedure, but also plot the confidence interval
# for the pseudomedian of the paired differences.
dev.new()
stripChart(Mayfly.Count ~ Location, data = Helsel.Hirsch.02.Mayfly.df,
col = c("green", "brown"), p.value = TRUE, paired = TRUE,
ci.and.test = "nonparametric", ylab = "Number of Mayfly Nymphs",
plot.diff = TRUE, diff.col = "blue")
#
# Clean up
#
graphics.off()
