Use color to show the density of points in a scatterplot
Description
The plotting region of the scatterplot is divided into bins. The number of data points falling within each bin is summed and then plotted using the image function. This is particularly useful when there are so many points that each point cannot be distinctly identified.
Usage
1 2 3 4  scatterplot.density(x, y, zlim, xylim, num.bins=64,
col=kristen.colors(32), xlab, ylab, main, density.in.percent=TRUE,
col.regression.line=1, col.one.to.one.line=grey(0.4),
col.bar.legend=TRUE, plt.beyond.zlim=FALSE, ...)

Arguments
x 
Vector or matrix of xcoordinates of points to be plotted. Missing values are not permitted. 
y 
Vector or matrix of ycoordinates of points to be plotted. Missing values are not permitted. 
zlim 
Vector defining the minimum and maximum of the data
density values, to which to assign the two most extreme colors in the

xylim 
Specification of extreme values that the first and last
bins are expected to contain in the x and ydirections. May be a
single vector of the limits for the x and y axes; e.g., using
xylim=c(0,120) specifies that, in both the x and
ydirections, the first bin should contain 0 and the last contain
120. May also be a list in the form: xylim=list(xlim=c(x1
,x2), ylim=c(y1, y2)), allowing for the different ranges on the
axes. If not specified, xlim is the range of Note that 
num.bins 
Number of bins to be used when calculating the data density in both the x and ydirections. May be a single number, e.g. num.bins=50, which produces 50 bins in each direction. May also be a list in the form num.bins=list(num.bins.x=n1, num.bins.y=n2) to specify differing numbering of bins for the x and ydirections. The default is to use 64 bins for both axes (num.bins=64). Note that 
col 
Color range to use when drawing bins, with the first color assigned to zlim[1] and last color assigned to zlim[2]. Default is kristen.colors(32). 
xlab 
The label for the xaxis. If not specified by the user,
defaults to the expression the user named as parameter 
ylab 
The label for the yaxis. If not specified by the user,
defaults to the expression the user named as parameter 
main 
The main title for the density scatterplot. If not specified, the default is “Data Density Plot (%)” when density.in.percent=TRUE, and “Data Frequency Plot (counts)” otherwise. 
density.in.percent 
A logical indicating whether the density values should represent a percentage of the total number of data points, rather than a count value. Default is density.in.percent=TRUE. 
col.regression.line 
A color number or color name for the
regression line and estimated regression equation ( 
col.one.to.one.line 
A color number or color name for the regression onetoone line to be overlaid on density scatterplot. If NULL, the onetoone line is not displayed. Defaults to a dark grey line. If the onetoone line is displayed, it will be as a dashed line (lty=3). 
col.bar.legend 
A logical indicating whether a
“color legend” of the form given by

plt.beyond.zlim 
IF TRUE, and if 
... 
Any additional parameters to be passed to the

Details
The plotting region of the scatterplot is divided into bins.
The number of data points falling within each bin is summed and then
plotted using the image
function. The default is to
plot the percent of the data falling within each bin, rather than a
raw count value. The arguments xylim and num.bins can include
different settings for the x and yaxis. This makes it easier to
plot different variables on each axis, e.g. temperature
vs. ozone. Note that xylim
and num.bins
together
determine how the bins are defined.
Note that xylim
and num.bins
together determine how the
bins are defined. This is done using the cut
function.
Assigning values to bins is more complicated than might be expected.
For example, values that fall at cutoff points between bins are
difficult to deal with. This function accepts the default setting for
cut
, which assigns values which fall on a cutoff point
to the bin on the left; that is, the intervals are open on the left
and closed on the right. This means that a point with xvalue equal
to xlim[1] and/or yvalue equal to ylim[1] would not be
assigned to any interval, which is probably not what the user intends
in this circumstance. Therefore, this code determines the number of
bins in the xdirection so that xlim[1] and xlim[2] are
at the center of the first and last bin in the xdirection (and
similarly for the ydirection). This means that the first and last
bins actually extend a bit past the limits specified. For most
applicatons, which use large numbers of data points and bins, this
shouldn't be noticeable, but it may be in smalled examples like the
first one given below.
Value
A density scatterplot; that is, a pattern of shaded squares representing the counts/percentages of the points falling in each square.
Author(s)
Original version (plot.density.scatter.plot
) by Kristen
Foley, adapted for aqfig by Jenise Swall
See Also
vertical.image.legend
,
kristen.colors
, image
, cut
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25  ## As a simple test case, build x and y vectors consisting only of the
## integers 13.
x < c( rep(1, 7), rep(2, 12), rep(3, 6) )
y < c( rep(1, 5), rep(2, 2), rep(1, 2), rep(2, 8), rep(3, 2),
rep(2, 2), rep(3, 4) )
## For this test case, I've totaled the counts below.
count.df < data.frame(x=rep(1:3, each=3), y=rep(1:3, times=3), ct=c(5,
2, 0, 2, 8, 2, 0, 2, 4) )
## Make a density scatterplot with counts and percentages.
par(mfrow=c(1,2))
scatterplot.density(x, y, num.bins=3, col=heat.colors(7),
density.in.percent=FALSE,
col.one.to.one.line="green")
text(count.df$x, count.df$y, count.df$ct, col="purple")
scatterplot.density(x, y, num.bins=3, col=heat.colors(7), col.one.to.one.line=1)
text(count.df$x, count.df$y, count.df$ct/sum(count.df$ct))
## An example closer to actual usage.
x < rnorm(100000,50,5)
y < 3 + (.89*x) + rnorm(100000,0,5)
par(mfrow=c(1,1))
scatterplot.density(x, y)
