Computes coordinates of cumulative distribution function of x, and by defaults
plots it as a step function. A grouping variable may be specified so that
stratified estimates are computed and (by default) plotted. If there is
more than one group, the labcurve
function is used (by default) to label
the multiple step functions or to draw a legend defining line types, colors,
or symbols by linking them with group labels. A weights
vector may
be specified to get weighted estimates. Specify normwt
to make
weights
sum to the length of x
(after removing NAs). Other wise
the total sample size is taken to be the sum of the weights.
Ecdf
is actually a method, and Ecdf.default
is what's
called for a vector argument. Ecdf.data.frame
is called when the
first argument is a data frame. This function can automatically set up
a matrix of ECDFs and wait for a mouse click if the matrix requires more
than one page. Categorical variables, character variables, and
variables having fewer than a set number of unique values are ignored.
If par(mfrow=..)
is not set up before Ecdf.data.frame
is
called, the function will try to figure the best layout depending on the
number of variables in the data frame. Upon return the original
mfrow
is left intact.
When the first argument to Ecdf
is a formula, a Trellis/Lattice function
Ecdf.formula
is called. This allows for multipanel
conditioning, superposition using a groups
variable, and other
Trellis features, along with the ability to easily plot transformed
ECDFs using the fun
argument. For example, if fun=qnorm
,
the inverse normal transformation will be used for the yaxis. If the
transformed curves are linear this indicates normality. Like the
xYplot
function, Ecdf
will create a function Key
if
the groups
variable is used. This function can be invoked by the
user to define the keys for the groups.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  Ecdf(x, ...)
## Default S3 method:
Ecdf(x, what=c('F','1F','f','1f'),
weights=rep(1, length(x)), normwt=FALSE,
xlab, ylab, q, pl=TRUE, add=FALSE, lty=1,
col=1, group=rep(1,length(x)), label.curves=TRUE, xlim,
subtitles=TRUE, datadensity=c('none','rug','hist','density'),
side=1,
frac=switch(datadensity,none=NA,rug=.03,hist=.1,density=.1),
dens.opts=NULL, lwd=1, log='', ...)
## S3 method for class 'data.frame'
Ecdf(x, group=rep(1,nrows),
weights=rep(1, nrows), normwt=FALSE,
label.curves=TRUE, n.unique=10, na.big=FALSE, subtitles=TRUE,
vnames=c('labels','names'),...)
## S3 method for class 'formula'
Ecdf(x, data=sys.frame(sys.parent()), groups=NULL,
prepanel=prepanel.Ecdf, panel=panel.Ecdf, ..., xlab,
ylab, fun=function(x)x, what=c('F','1F','f','1f'), subset=TRUE)

x 
a numeric vector, data frame, or Trellis/Lattice formula 
what 
The default is 
weights 
numeric vector of weights. Omit or specify a zerolength vector or NULL to get unweighted estimates. 
normwt 
see above 
xlab 
xaxis label. Default is label(x) or name of calling argument. For

ylab 
yaxis label. Default is 
q 
a vector for quantiles for which to draw reference lines on the plot. Default is not to draw any. 
pl 
set to F to omit the plot, to just return estimates 
add 
set to TRUE to add the cdf to an existing plot. Does not apply if using lattice graphics (i.e., if a formula is given as the first argument). 
lty 
integer line type for plot. If 
lwd 
line width for plot. Can be a vector corresponding to 
log 
see 
col 
color for step function. Can be a vector. 
group 
a numeric, character, or 
label.curves 
applies if more than one 
xlim 
xaxis limits. Default is entire range of 
subtitles 
set to 
datadensity 
If 
side 
If 
frac 
passed to 
dens.opts 
a list of optional arguments for 
... 
other parameters passed to plot if add=F. For data frames, other
parameters to pass to 
n.unique 
minimum number of unique values before an ECDF is drawn for a variable in a data frame. Default is 10. 
na.big 
set to 
vnames 
By default, variable labels are used to label xaxes. Set 
method 
method for computing the empirical cumulative distribution. See

fun 
a function to transform the cumulative proportions, for the
Trellistype usage of 
data, groups, subset,prepanel, panel 
the usual Trellis/Lattice parameters, with 
for Ecdf.default
an invisible list with elements x and y giving the
coordinates of the cdf. If there is more than one group
, a list of
such lists is returned. An attribute, N
, is in the returned
object. It contains the elements n
and m
, the number of
nonmissing and missing observations, respectively.
plots
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
wtd.Ecdf
, label
, table
, cumsum
, labcurve
, xYplot
, histSpike
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51  set.seed(1)
ch < rnorm(1000, 200, 40)
Ecdf(ch, xlab="Serum Cholesterol")
scat1d(ch) # add rug plot
histSpike(ch, add=TRUE, frac=.15) # add spike histogram
# Better: add a data density display automatically:
Ecdf(ch, datadensity='density')
label(ch) < "Serum Cholesterol"
Ecdf(ch)
other.ch < rnorm(500, 220, 20)
Ecdf(other.ch,add=TRUE,lty=2)
sex < factor(sample(c('female','male'), 1000, TRUE))
Ecdf(ch, q=c(.25,.5,.75)) # show quartiles
Ecdf(ch, group=sex,
label.curves=list(method='arrow'))
# Example showing how to draw multiple ECDFs from paired data
pre.test < rnorm(100,50,10)
post.test < rnorm(100,55,10)
x < c(pre.test, post.test)
g < c(rep('Pre',length(pre.test)),rep('Post',length(post.test)))
Ecdf(x, group=g, xlab='Test Results', label.curves=list(keys=1:2))
# keys=1:2 causes symbols to be drawn periodically on top of curves
# Draw a matrix of ECDFs for a data frame
m < data.frame(pre.test, post.test,
sex=sample(c('male','female'),100,TRUE))
Ecdf(m, group=m$sex, datadensity='rug')
freqs < sample(1:10, 1000, TRUE)
Ecdf(ch, weights=freqs) # weighted estimates
# Trellis/Lattice examples:
region < factor(sample(c('Europe','USA','Australia'),100,TRUE))
year < factor(sample(2001:2002,1000,TRUE))
Ecdf(~ch  region*year, groups=sex)
Key() # draw a key for sex at the default location
# Key(locator(1)) # userspecified positioning of key
age < rnorm(1000, 50, 10)
Ecdf(~ch  equal.count(age), groups=sex) # use overlapping shingles
Ecdf(~ch  sex, datadensity='hist', side=3) # add spike histogram at top

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.