plsmo: Plot smoothed estimates
In harrelfe/Hmisc: Harrell Miscellaneous

plsmo

R Documentation

Plot smoothed estimates

Description

Plot smoothed estimates of x vs. y, handling missing data for lowess or supsmu, and adding axis labels. Optionally suppresses plotting extrapolated estimates. An optional group variable can be specified to compute and plot the smooth curves by levels of group. When group is present, the datadensity option will draw tick marks showing the location of the raw x-values, separately for each curve. plsmo has an option to plot connected points for raw data, with no smoothing. The non-panel version of plsmo allows y to be a matrix, for which smoothing is done separately over its columns. If both group and multi-column y are used, the number of curves plotted is the product of the number of groups and the number of y columns.

method='intervals' is often used when y is binary, as it may be tricky to specify a reasonable smoothing parameter to lowess or supsmu in this case. The 'intervals' method uses the cutGn function to form intervals of x containing a minimum of mobs observations. For each interval the ifun function summarizes y, with the default being the mean (proportions for binary y). The results are plotted as step functions, with vertical discontinuities drawn with a saturation of 0.15 of the original color. A plus sign is drawn at the mean x within each interval. For this approach, the default x-range is the entire raw data range, and trim and evaluate are ignored. For panel.plsmo it is best to specify type='l' when using 'intervals'.

panel.plsmo is a panel function for trellis for the xyplot function that uses plsmo and its options to draw one or more nonparametric function estimates on each panel. This has advantages over using xyplot with panel.xyplot and panel.loess: (1) by default it will invoke labcurve to label the curves where they are most separated, (2) the datadensity option will put rug plots on each curve (instead of a single rug plot at the bottom of the graph), and (3) when panel.plsmo invokes plsmo it can use the "super smoother" (supsmu function) instead of lowess, or pass method='intervals'. panel.plsmo senses when a group variable is specified to xyplot so that it can invoke panel.superpose instead of panel.xyplot. Using panel.plsmo through trellis has some advantages over calling plsmo directly in that conditioning variables are allowed and trellis uses nicer fonts etc.

When a group variable was used, panel.plsmo creates a function Key in the session frame that the user can invoke to draw a key for individual data point symbols used for the groups. By default, the key is positioned at the upper right corner of the graph. If Key(locator(1)) is specified, the key will appear so that its upper left corner is at the coordinates of the mouse click.

For ggplot2 graphics the counterparts are stat_plsmo and histSpikeg.

Usage

plsmo(x, y, method=c("lowess","supsmu","raw","intervals"), xlab, ylab, 
      add=FALSE, lty=1 : lc, col=par("col"), lwd=par("lwd"),
      iter=if(length(unique(y))>2) 3 else 0, bass=0, f=2/3, mobs=30, trim, 
      fun, ifun=mean, group, prefix, xlim, ylim, 
      label.curves=TRUE, datadensity=FALSE, scat1d.opts=NULL,
      lines.=TRUE, subset=TRUE,
      grid=FALSE, evaluate=NULL, ...)


#To use panel function:
#xyplot(formula=y ~ x | conditioningvars, groups,
#       panel=panel.plsmo, type='b', 
#       label.curves=TRUE,
#       lwd = superpose.line$lwd, 
#       lty = superpose.line$lty, 
#       pch = superpose.symbol$pch, 
#       cex = superpose.symbol$cex, 
#       font = superpose.symbol$font, 
#       col = NULL, scat1d.opts=NULL, \dots)

Arguments

`x`	vector of x-values, NAs allowed
`y`	vector or matrix of y-values, NAs allowed
`method`	`"lowess"` (the default), `"supsmu"`, `"raw"` to not smooth at all, or `"intervals"` to use intervals (see above)
`xlab`	x-axis label iff add=F. Defaults of label(x) or argument name.
`ylab`	y-axis label, like xlab.
`add`	Set to T to call lines instead of plot. Assumes axes already labeled.
`lty`	line type, default=1,2,3,..., corresponding to columns of `y` and `group` combinations
`col`	color for each curve, corresponding to `group`. Default is current `par("col")`.
`lwd`	vector of line widths for the curves, corresponding to `group`. Default is current `par("lwd")`. `lwd` can also be specified as an element of `label.curves` if `label.curves` is a list.
`iter`	iter parameter if `method="lowess"`, default=0 if `y` is binary, and 3 otherwise.
`bass`	bass parameter if `method="supsmu"`, default=0.
`f`	passed to the `lowess` function, for `method="lowess"`
`mobs`	for `method='intervals'`, the minimum number of observations per interval
`trim`	only plots smoothed estimates between trim and 1-trim quantiles of x. Default is to use 10th smallest to 10th largest x in the group if the number of observations in the group exceeds 200 (0 otherwise). Specify trim=0 to plot over entire range.
`fun`	after computing the smoothed estimates, if `fun` is given the y-values are transformed by `fun()`
`ifun`	a summary statistic function to apply to the `y`-variable for `method='intervals'`. Default is `mean`.
`group`	a variable, either a `factor` vector or one that will be converted to `factor` by `plsmo`, that is used to stratify the data so that separate smooths may be computed
`prefix`	a character string to appear in group of group labels. The presence of `prefix` ensures that `labcurve` will be called even when `add=TRUE`.
`xlim`	a vector of 2 x-axis limits. Default is observed range.
`ylim`	a vector of 2 y-axis limits. Default is observed range.
`label.curves`	set to `FALSE` to prevent `labcurve` from being called to label multiple curves corresponding to `group`s. Set to a list to pass options to `labcurve`. `lty` and `col` are passed to `labcurve` automatically.
`datadensity`	set to `TRUE` to draw tick marks on each curve, using x-coordinates of the raw data `x` values. This is done using `scat1d`.
`scat1d.opts`	a list of options to hand to `scat1d`
`lines.`	set to `FALSE` to suppress smoothed curves from being drawn. This can make sense if `datadensity=TRUE`.
`subset`	a logical or integer vector specifying a subset to use for processing, with respect too all variables being analyzed
`grid`	set to `TRUE` if the R `grid` package drew the current plot
`evaluate`	number of points to keep from smoother. If specified, an equally-spaced grid of `evaluate` `x` values will be obtained from the smoother using linear interpolation. This will keep from plotting an enormous number of points if the dataset contains a very large number of unique `x` values.
`...`	optional arguments that are passed to `scat1d`, or optional parameters to pass to `plsmo` from `panel.plsmo`. See optional arguments for `plsmo` above.
`type`	set to `p` to have `panel.plsmo` plot points (and not call `plsmo`), `l` to call `plsmo` and not plot points, or use the default `b` to plot both.
`pch`, `cex`, `font`	vectors of graphical parameters corresponding to the `group`s (scalars if `group` is absent). By default, the parameters set up by `trellis` will be used.

Value

plsmo returns a list of curves (x and y coordinates) that was passed to labcurve

Side Effects

plots, and panel.plsmo creates the Key function in the session frame.

Examples

set.seed(1)
x <- 1:100
y <- x + runif(100, -10, 10)
plsmo(x, y, "supsmu", xlab="Time of Entry") 
#Use label(y) or "y" for ylab


plsmo(x, y, add=TRUE, lty=2)
#Add lowess smooth to existing plot, with different line type


age <- rnorm(500, 50, 15)
survival.time <- rexp(500)
sex <- sample(c('female','male'), 500, TRUE)
race <- sample(c('black','non-black'), 500, TRUE)
plsmo(age, survival.time < 1, fun=qlogis, group=sex) # plot logit by sex

#Bivariate Y
sbp <- 120 + (age - 50)/10 + rnorm(500, 0, 8) + 5 * (sex == 'male')
dbp <-  80 + (age - 50)/10 + rnorm(500, 0, 8) - 5 * (sex == 'male')
Y <- cbind(sbp, dbp)
plsmo(age, Y)
plsmo(age, Y, group=sex)


#Plot points and smooth trend line using trellis 
# (add type='l' to suppress points or type='p' to suppress trend lines)
require(lattice)
xyplot(survival.time ~ age, panel=panel.plsmo)


#Do this for multiple panels
xyplot(survival.time ~ age | sex, panel=panel.plsmo)

#Repeat this using equal sample size intervals (n=25 each) summarized by
#the median, then a proportion (mean of binary y)
xyplot(survival.time ~ age | sex, panel=panel.plsmo, type='l',
       method='intervals', mobs=25, ifun=median)
ybinary <- ifelse(runif(length(sex)) < 0.5, 1, 0)
xyplot(ybinary ~ age, groups=sex, panel=panel.plsmo, type='l',
       method='intervals', mobs=75, ifun=mean, xlim=c(0, 120))


#Do this for subgroups of points on each panel, show the data
#density on each curve, and draw a key at the default location
xyplot(survival.time ~ age | sex, groups=race, panel=panel.plsmo,
       datadensity=TRUE)
Key()


#Use wloess.noiter to do a fast weighted smooth
plot(x, y)
lines(wtd.loess.noiter(x, y))
lines(wtd.loess.noiter(x, y, weights=c(rep(1,50), 100, rep(1,49))), col=2)
points(51, y[51], pch=18)   # show overly weighted point
#Try to duplicate this smooth by replicating 51st observation 100 times
lines(wtd.loess.noiter(c(x,rep(x[51],99)),c(y,rep(y[51],99)),
      type='ordered all'), col=3)
#Note: These two don't agree exactly

harrelfe/Hmisc documentation built on June 13, 2025, 7:22 a.m.