Computes and plots conditional densities describing how the
conditional distribution of a categorical variable y
changes over a
numerical variable x
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  cdplot(x, ...)
## Default S3 method:
cdplot(x, y,
plot = TRUE, tol.ylab = 0.05, ylevels = NULL,
bw = "nrd0", n = 512, from = NULL, to = NULL,
col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL,
yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...)
## S3 method for class 'formula'
cdplot(formula, data = list(),
plot = TRUE, tol.ylab = 0.05, ylevels = NULL,
bw = "nrd0", n = 512, from = NULL, to = NULL,
col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL,
yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...,
subset = NULL)

x 
an object, the default method expects a single numerical variable (or an object coercible to this). 
y 
a 
formula 
a 
data 
an optional data frame. 
plot 
logical. Should the computed conditional densities be plotted? 
tol.ylab 
convenience tolerance parameter for yaxis annotation. If the distance between two labels drops under this threshold, they are plotted equidistantly. 
ylevels 
a character or numeric vector specifying in which order the levels of the dependent variable should be plotted. 
bw, n, from, to, ... 
arguments passed to 
col 
a vector of fill colors of the same length as 
border 
border color of shaded polygons. 
main, xlab, ylab 
character strings for annotation 
yaxlabels 
character vector for annotation of y axis, defaults to

xlim, ylim 
the range of x and y values with sensible defaults. 
subset 
an optional vector specifying a subset of observations to be used for plotting. 
cdplot
computes the conditional densities of x
given
the levels of y
weighted by the marginal distribution of y
.
The densities are derived cumulatively over the levels of y
.
This visualization technique is similar to spinograms (see spineplot
)
and plots P(y  x) against x. The conditional probabilities
are not derived by discretization (as in the spinogram), but using a smoothing
approach via density
.
Note, that the estimates of the conditional densities are more reliable for highdensity regions of x. Conversely, the are less reliable in regions with only few x observations.
The conditional density functions (cumulative over the levels of y
)
are returned invisibly.
Achim Zeileis Achim.Zeileis@Rproject.org
Hofmann, H., Theus, M. (2005), Interactive graphics for visualizing conditional distributions, Unpublished Manuscript.
spineplot
, density
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  ## NASA space shuttle oring failures
fail < factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1),
levels = 1:2, labels = c("no", "yes"))
temperature < c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
## CD plot
cdplot(fail ~ temperature)
cdplot(fail ~ temperature, bw = 2)
cdplot(fail ~ temperature, bw = "SJ")
## compare with spinogram
(spineplot(fail ~ temperature, breaks = 3))
## highlighting for failures
cdplot(fail ~ temperature, ylevels = 2:1)
## scatter plot with conditional density
cdens < cdplot(fail ~ temperature, plot = FALSE)
plot(I(as.numeric(fail)  1) ~ jitter(temperature, factor = 2),
xlab = "Temperature", ylab = "Conditional failure probability")
lines(53:81, 1  cdens[[1]](53:81), col = 2)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.