splot | R Documentation |
A plotting function aimed at automating some common visualization tasks in order to ease data exploration.
splot(y, data = NULL, su = NULL, type = "", split = "median",
levels = list(), sort = NULL, error = "standard",
error.color = "#585858", error.lwd = 2, lim = 9, lines = TRUE, ...,
colors = "pastel", colorby = NULL, colorby.leg = TRUE,
color.lock = FALSE, color.offset = 1.1, color.summary = "mean",
opacity = 1, dark = getOption("splot.dark", FALSE), x = NULL,
by = NULL, between = NULL, cov = NULL, line.type = "l",
mv.scale = "none", mv.as.x = FALSE, save = FALSE, format = cairo_pdf,
dims = dev.size(), file.name = "splot", myl = NULL, mxl = NULL,
autori = TRUE, xlas = 0, ylas = 1, xaxis = TRUE, yaxis = TRUE,
breaks = "sturges", density.fill = TRUE, density.opacity = 0.4,
density.args = list(), leg = "outside", lpos = "auto", lvn = TRUE,
leg.title = TRUE, leg.args = list(), title = TRUE, labx = TRUE,
laby = TRUE, lty = TRUE, lwd = 2, sub = TRUE, ndisp = TRUE,
note = TRUE, font = c(title = 2, sud = 1, leg = 1, leg.title = 2, note =
3), cex = c(title = 1.5, sud = 0.9, leg = 0.9, note = 0.7, points = 1),
sud = TRUE, labels = TRUE, labels.filter = "_", labels.trim = 20,
points = TRUE, points.first = TRUE, byx = TRUE, drop = c(x = TRUE, by
= TRUE, bet = TRUE), prat = c(1, 1), check.height = TRUE,
model = FALSE, options = NULL, add = NULL)
y |
a formula (see note), or the primary variable(s) to be shown on the y axis (unless |
data |
a |
su |
a subset to all variables, applied after they are all retrieved from |
type |
determines the type of plot to make, between |
split |
how to split any continuous variables (those with more than |
levels |
a list with entries corresponding to variable names, used to rename and/or reorder factor levels. To
reorder a factor, enter a vector of either numbers or existing level names in the new order (e.g.,
|
sort |
specified the order of character or factor |
error |
string; sets the type of error bars to show in bar or line plots, or turns them off. If |
error.color |
color of the error bars. Default is |
error.lwd |
line weight of error bars. Default is 2. |
lim |
numeric; checked against the number of factor levels of each variable. Used to decide which variables should
be split, which colors to use, and when to turn off the legend. Default is |
lines |
logical or a string specifying the type of lines to be drawn in scatter plots. By default (and whenever
|
... |
passes additional arguments to |
colors |
sets a color theme or manually specifies colors. Default theme is |
colorby |
a variable or list of arguments used to set colors and the legend, alternatively to |
colorby.leg |
logical; if |
color.lock |
logical; if |
color.offset |
how much points or histogram bars should be offset from the initial color used for lines. Default is 1.1; values greater than 1 lighten, and less than 1 darken. |
color.summary |
specifies the function used to collapse multiple colors for a single display. Either a string
matching one of |
opacity |
a number between 0 and 1; sets the opacity of points, lines, and bars. Semi-opaque lines will sometimes not be displayed in the plot window, but will show up when the plot is written to a file. |
dark |
logical; if |
x |
secondary variable, to be shown in on the x axis. If not specified, |
by |
the 'splitting' variable within each plot, by which the plotted values of |
between |
a single object or name, or two in a vector (e.g., |
cov |
additional variables used for adjustment. Bar and line plots include all |
line.type |
a character setting the style of line (e.g., with points at joints) to be drawn in line plots. Default
is |
mv.scale |
determines whether to center and scale multiple |
mv.as.x |
logical; if |
save |
logical; if |
format |
the type of file to save plots as. Default is |
dims |
a vector of 2 values ( |
file.name |
a string with the name of the file to be save (excluding the extension, as this is added depending on
|
myl |
sets the range of the y axis ( |
mxl |
sets the range of the x axis ( |
autori |
logical; if |
xlas, ylas |
numeric; sets the orientation of the x- and y-axis labels. See |
xaxis, yaxis |
logical; if |
breaks |
determines the width of histogram bars. See |
density.fill |
logical; |
density.opacity |
opacity of the density polygons, between 0 and 1. |
density.args |
list of arguments to be passed to |
leg |
sets the legend inside or outside the plot frames (when a character matching |
lpos |
sets the position of the legend within its frame (whether inside or outside of the plot frames) based on
keywords (see |
lvn |
level variable name. Logical: if |
leg.title |
sets the title of the legend (which is the by variable name by default), or turns it off with
|
leg.args |
a list passing arguments to the |
title |
logical or a character: if |
labx, laby |
logical or a character: if |
lty |
logical or a vector: if |
lwd |
numeric; sets the weight of lines in line, density, and scatter plots. Default is 2. See
|
sub |
affects the small title above each plot showing |
ndisp |
logical; if |
note |
logical; if |
font |
named numeric vector: |
cex |
named numeric vector: |
sud |
affects the heading for subset and covariates/line adjustments (su display); text replaces it, and
|
labels |
logical; if |
labels.filter |
a regular expression string to be replaced in label texts with a blank space. Default is
|
labels.trim |
numeric or logical; the maximum length of label texts (in number of characters). Default is 20, with
any longer labels being trimmed. Set to |
points |
logical; if |
points.first |
logical; if |
byx |
logical; if |
drop |
named logical vector: |
prat |
panel ratio, referring to the ratio between plot frames and the legend frame when the legend is out. A
single number will make all panels of equal width. A vector of two numbers will adjust the ratio between plot panels
and the legend panel. For example, |
check.height |
logical; if |
model |
logical; if |
options |
a list with named arguments, useful for setting temporary defaults if you plan on using some of the same
options for multiple plots (e.g., |
add |
evaluated within the function, so you can refer to the objects that are returned, to variable names (those
from an entered data frame or entered as arguments), or entered data by their position, preceded by '.' (e.g.,
|
A list containing data and settings is invisibly returned, which might be useful to check for errors.
Each of these objects can also be pulled from within add
:
dat | a data.frame of processed, unsegmented data. |
cdat | a list of list s of data.frame s of processed, segmented data. |
txt | a list of variable names. used mostly to pull variables from data or the environment. |
ptxt | a list of processed variable and level names. Used mostly for labeling. |
seg | a list containing segmentation information (such as levels) for each variable. |
ck | a list of settings. |
lega | a list of arguments that were or would have been passed to legend . |
fmod | an lm object if model is TRUE , and the model succeeded.
|
formulas
When y
is a formula (has a ~
), other variables will be pulled from it:
y ~ x * by * between[1] * between[2] + cov[1] + cov[2] + cov[n]
If y
has multiple variables, by
is used to identify the variable (it becomes a factor with variable names
as levels), so anything entered as by
is treated as between[1]
, between[1]
is moved to
between[2]
, and between[2]
is discarded with a message.
named vectors
Named vector arguments like font
, cex
, and drop
can be set with a single value, positionally, or
with names. If a single value is entered (e.g., drop = FALSE
), this will be applied to each level (i.e.,
c(x = FALSE, by = FALSE, bet = FALSE)
). If more than one value is entered, these will be treated positionally
(e.g., cex =
c(2, 1.2)
would be read as c(title = 2, sud = 1.2, leg = .9, note = .7, points = 1)
).
If values are named, only named values will be set, with other defaults retained (e.g., cex =
c(note = 1.2)
would be read as c(title = 1.5, sud = .9, leg = .9, note = 1.2, points = 1)
).
x-axis levels text
If the text of x-axis levels (those corresponding to the levels of x
) are too long, they are hidden before
overlapping. To try and avoid this, by default longer texts are trimmed (dictated by labels.trim
), and at some
point the orientation of level text is changed (settable with xlas
), but you may still see level text missing.
To make these visible, you can reduce labels.trim
from the default of 20 (or rename the levels of that variable),
make the level text vertical (xlas = 3
), or expand your plot window if possible.
missing levels, lines, and/or error bars
By default (if drop = TRUE
), levels of x
with no data are dropped, so you may not see every level of your
variable, at all or at a level of by
or between
. Sometimes error bars cannot be estimated (if, say, there
is only one observation at the given level), but lines are still drawn in these cases, so you may sometimes see levels
without error bars even when error bars are turned on. Sometimes (particularly when drop['x']
is FALSE
),
you might see floating error bars with no lines drawn to them, or what appear to be completely empty levels. This
happens when there is a missing level of x
between two non-missing levels, potentially making an orphaned level
(if a non-missing level is surrounded by missing levels). If there are no error bars for this orphaned level, by default
nothing will be drawn to indicate it. If you set line.type
to 'b'
(or any other type with points), a point
will be drawn at such error-bar-less, orphaned levels.
unexpected failures
splot tries to clean up after itself in the case of an error, but you may still run into errors that break things before
this can happen. If after a failed plot you find that you're unable to make any new plots, or new plots are drawn over
old ones, you might try entering dev.off()
into the console. If new plots look off (splot's
par
settings didn't get reset), you may have to close the plot window to reset
par
(if you're using RStudio, Plots > "Remove Plot..." or "Clear All..."), or restart R.
# simulating data
n <- 2000
dat <- data.frame(sapply(c("by", "bet1", "bet2"), function(c) sample(0:1, n, TRUE)))
dat$x <- with(
dat,
rnorm(n) + by * -.4 + by * bet1 * -.3 + by * bet2 *
.3 + bet1 * bet2 * .9 - .8 + rnorm(n, 0, by)
)
dat$y <- with(
dat,
x * .2 + by * .3 + bet2 * -.6 + bet1 * bet2 * .8 + x *
by * bet1 * -.5 + x * by * bet1 * bet2 * -.5
+ rnorm(n, 5) + rnorm(n, -1, .1 * x^2)
)
# looking at the distribution of y between bets split by by
splot(y, by = by, between = c(bet1, bet2), data = dat)
# looking at quantile splits of y in y by x
splot(y ~ x * y, dat, split = "quantile")
# looking at y by x between bets
splot(y ~ x, dat, between = c(bet1, bet2))
# sequentially adding levels of split
splot(y ~ x * by, dat)
splot(y ~ x * by * bet1, dat)
splot(y ~ x * by * bet1 * bet2, dat)
# same as the last but entered by name
splot(y, x = x, by = by, between = c(bet1, bet2), data = dat)
# zooming in on one of the windows
splot(y ~ x * by, dat, bet1 == 1 & bet2 == 0)
# comparing an adjusted lm prediction line with a loess line
# this could also be entered as y ~ poly(x,3)
splot(y ~ x + x^2 + x^3, dat, bet1 == 1 & bet2 == 0 & by == 1, add = {
lines(x[order(x)], loess(y ~ x)$fitted[order(x)], lty = 2)
legend("topright", c("lm", "loess"), lty = c(1, 2), lwd = c(2, 1), bty = "n")
})
# looking at different versions of x added to y
splot(cbind(
Raw = y + x,
Sine = y + sin(x),
Cosine = y + cos(x),
Tangent = y + tan(x)
) ~ x, dat, myl = c(-10, 15), lines = "loess", laby = "y + versions of x")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.