# Ecdf: Empirical Cumulative Distribution Plot In Hmisc: Harrell Miscellaneous

 Ecdf R Documentation

## Empirical Cumulative Distribution Plot

### Description

Computes coordinates of cumulative distribution function of x, and by defaults plots it as a step function. A grouping variable may be specified so that stratified estimates are computed and (by default) plotted. If there is more than one group, the `labcurve` function is used (by default) to label the multiple step functions or to draw a legend defining line types, colors, or symbols by linking them with group labels. A `weights` vector may be specified to get weighted estimates. Specify `normwt` to make `weights` sum to the length of `x` (after removing NAs). Other wise the total sample size is taken to be the sum of the weights.

`Ecdf` is actually a method, and `Ecdf.default` is what's called for a vector argument. `Ecdf.data.frame` is called when the first argument is a data frame. This function can automatically set up a matrix of ECDFs and wait for a mouse click if the matrix requires more than one page. Categorical variables, character variables, and variables having fewer than a set number of unique values are ignored. If `par(mfrow=..)` is not set up before `Ecdf.data.frame` is called, the function will try to figure the best layout depending on the number of variables in the data frame. Upon return the original `mfrow` is left intact.

When the first argument to `Ecdf` is a formula, a Trellis/Lattice function `Ecdf.formula` is called. This allows for multi-panel conditioning, superposition using a `groups` variable, and other Trellis features, along with the ability to easily plot transformed ECDFs using the `fun` argument. For example, if `fun=qnorm`, the inverse normal transformation will be used for the y-axis. If the transformed curves are linear this indicates normality. Like the `xYplot` function, `Ecdf` will create a function `Key` if the `groups` variable is used. This function can be invoked by the user to define the keys for the groups.

### Usage

```Ecdf(x, ...)

## Default S3 method:
Ecdf(x, what=c('F','1-F','f','1-f'),
weights=rep(1, length(x)), normwt=FALSE,
xlab, ylab, q, pl=TRUE, add=FALSE, lty=1,
col=1, group=rep(1,length(x)), label.curves=TRUE, xlim,
side=1,
dens.opts=NULL, lwd=1, log='', ...)

## S3 method for class 'data.frame'
Ecdf(x, group=rep(1,nrows),
weights=rep(1, nrows), normwt=FALSE,
label.curves=TRUE, n.unique=10, na.big=FALSE, subtitles=TRUE,
vnames=c('labels','names'),...)

## S3 method for class 'formula'
Ecdf(x, data=sys.frame(sys.parent()), groups=NULL,
prepanel=prepanel.Ecdf, panel=panel.Ecdf, ..., xlab,
ylab, fun=function(x)x, what=c('F','1-F','f','1-f'), subset=TRUE)
```

### Arguments

 `x` a numeric vector, data frame, or Trellis/Lattice formula `what` The default is `"F"` which results in plotting the fraction of values <= x. Set to `"1-F"` to plot the fraction > x or `"f"` to plot the cumulative frequency of values <= x. Use `"1-f"` to plot the cumulative frequency of values >= x. `weights` numeric vector of weights. Omit or specify a zero-length vector or NULL to get unweighted estimates. `normwt` see above `xlab` x-axis label. Default is label(x) or name of calling argument. For `Ecdf.formula`, `xlab` defaults to the `label` attribute of the x-axis variable. `ylab` y-axis label. Default is `"Proportion <= x"`, `"Proportion > x"`, or "Frequency <= x" depending on value of `what`. `q` a vector for quantiles for which to draw reference lines on the plot. Default is not to draw any. `pl` set to F to omit the plot, to just return estimates `add` set to TRUE to add the cdf to an existing plot. Does not apply if using lattice graphics (i.e., if a formula is given as the first argument). `lty` integer line type for plot. If `group` is specified, this can be a vector. `lwd` line width for plot. Can be a vector corresponding to `group`s. `log` see `plot`. Set `log='x'` to use log scale for `x`-axis. `col` color for step function. Can be a vector. `group` a numeric, character, or `factor` categorical variable used for stratifying estimates. If `group` is present, as many ECDFs are drawn as there are non–missing group levels. `label.curves` applies if more than one `group` exists. Default is `TRUE` to use `labcurve` to label curves where they are farthest apart. Set `label.curves` to a `list` to specify options to `labcurve`, e.g., `label.curves=list(method="arrow", cex=.8)`. These option names may be abbreviated in the usual way arguments are abbreviated. Use for example `label.curves=list(keys=1:5)` to draw symbols periodically (as in `pch=1:5` - see `points`) on the curves and automatically position a legend in the most empty part of the plot. Set `label.curves=FALSE` to suppress drawing curve labels. The `col`, `lty`, and `type` parameters are automatically passed to `labcurve`, although you can override them here. You can set `label.curves=list(keys="lines")` to have different line types defined in an automatically positioned key. `xlim` x-axis limits. Default is entire range of `x`. `subtitles` set to `FALSE` to suppress putting a subtitle at the bottom left of each plot. The subtitle indicates the numbers of non-missing and missing observations, which are labeled `n`, `m`. `datadensity` If `datadensity` is not `"none"`, either `scat1d` or `histSpike` is called to add a rug plot (`datadensity="rug"`), spike histogram (`datadensity="hist"`), or smooth density estimate (`"density"`) to the bottom or top of the ECDF. `side` If `datadensity` is not `"none"`, the default is to place the additional information on top of the x-axis (`side=1`). Use `side=3` to place at the top of the graph. `frac` passed to `histSpike` `dens.opts` a list of optional arguments for `histSpike` `...` other parameters passed to plot if add=F. For data frames, other parameters to pass to `Ecdf.default`. For `Ecdf.formula`, if `groups` is not used, you can also add data density information to each panel's ECDF by specifying the `datadensity` and optional `frac`, `side`, `dens.opts` arguments. `n.unique` minimum number of unique values before an ECDF is drawn for a variable in a data frame. Default is 10. `na.big` set to `TRUE` to draw the number of NAs in larger letters in the middle of the plot for `Ecdf.data.frame` `vnames` By default, variable labels are used to label x-axes. Set `vnames="names"` to instead use variable names. `method` method for computing the empirical cumulative distribution. See `wtd.Ecdf`. The default is to use the standard `"i/n"` method as is used by the non-Trellis versions of `Ecdf`. `fun` a function to transform the cumulative proportions, for the Trellis-type usage of `Ecdf` `data, groups, subset,prepanel, panel` the usual Trellis/Lattice parameters, with `groups` causing `Ecdf.formula` to overlay multiple ECDFs on one panel.

### Value

for `Ecdf.default` an invisible list with elements x and y giving the coordinates of the cdf. If there is more than one `group`, a list of such lists is returned. An attribute, `N`, is in the returned object. It contains the elements `n` and `m`, the number of non-missing and missing observations, respectively.

plots

### Author(s)

Frank Harrell
Department of Biostatistics, Vanderbilt University
fh@fharrell.com

`wtd.Ecdf`, `label`, `table`, `cumsum`, `labcurve`, `xYplot`, `histSpike`

### Examples

```set.seed(1)
ch <- rnorm(1000, 200, 40)
Ecdf(ch, xlab="Serum Cholesterol")
# Better: add a data density display automatically:

label(ch) <- "Serum Cholesterol"
Ecdf(ch)
other.ch <- rnorm(500, 220, 20)

sex <- factor(sample(c('female','male'), 1000, TRUE))
Ecdf(ch, q=c(.25,.5,.75))  # show quartiles
Ecdf(ch, group=sex,
label.curves=list(method='arrow'))

# Example showing how to draw multiple ECDFs from paired data
pre.test <- rnorm(100,50,10)
post.test <- rnorm(100,55,10)
x <- c(pre.test, post.test)
g <- c(rep('Pre',length(pre.test)),rep('Post',length(post.test)))
Ecdf(x, group=g, xlab='Test Results', label.curves=list(keys=1:2))
# keys=1:2 causes symbols to be drawn periodically on top of curves

# Draw a matrix of ECDFs for a data frame
m <- data.frame(pre.test, post.test,
sex=sample(c('male','female'),100,TRUE))

freqs <- sample(1:10, 1000, TRUE)
Ecdf(ch, weights=freqs)  # weighted estimates

# Trellis/Lattice examples:

region <- factor(sample(c('Europe','USA','Australia'),100,TRUE))
year <- factor(sample(2001:2002,1000,TRUE))
Ecdf(~ch | region*year, groups=sex)
Key()           # draw a key for sex at the default location
# Key(locator(1)) # user-specified positioning of key
age <- rnorm(1000, 50, 10)
Ecdf(~ch | equal.count(age), groups=sex)  # use overlapping shingles