Description Usage Arguments Value Determining the type of plot Conditional variables Reordering of factor levels Instance weights Axis scaling Missing values Sampling Factor preprocessing Coloring Generating multiple plots at once Debugging Column name matching Remarks on supported plot types Remarks on the use of options Limitations See Also Examples

The purpose of `plotluck`

is to let the user focus on *what* to plot,
and automate the *how*. Given a dependency formula with up to three
variables, it tries to choose the most suitable type of plot. It also automates
sampling large datasets, correct handling of observation weights, logarithmic
axis scaling, ordering and pruning of factor levels, and overlaying smoothing
curves or median lines.

1 |

`data` |
a data frame. | ||||||||||||||||||||||||||||

`formula` |
an object of class
In addition to these base plot types, the dot symbol
See also section "Generating multiple plots at once" below. | ||||||||||||||||||||||||||||

`weights` |
observation weights or frequencies (optional). | ||||||||||||||||||||||||||||

`opts` |
a named list of options (optional); See also | ||||||||||||||||||||||||||||

`...` |
additional parameters to be passed to the respective ggplot2 geom objects. |

a ggplot object, or a plotluck.multi object if the dot symbol was used.

Besides the shape of the formula, the algorithm takes into account the type of variables as either numeric, ordered, or unordered factors. Often, it makes sense to treat ordered factors similarly as numeric types.

One-variable numeric (resp. factor) distributions are usually represented by
density (resp. Cleveland dot) charts, but can be overridden to histograms or
bar plots using the `geom`

option. Density plots come with an overlaid
vertical median line.

For two numerical variables, by default a scatter plot is produced, but for
high numbers of points a hexbin is preferred (option `min.points.hex`

).
These plots come with a smoothing line and standard deviation.

The relation between two factor variables can be depicted best by spine
(a.k.a., mosaic) plots, unless they have too many levels (options
`max.factor.levels.spine.x`

, `max.factor.levels.spine.y`

,
`max.factor.levels.spine.z`

). Otherwise, a heat map is produced.

For a mixed-type (factor/numeric) pair of variables, violin (overridable
to box) plots are generated. However, if the resulting graph would contain
too many (more than `max.factor.levels.violin`

) violin plots in a row,
the algorithm switches automatically. The number of bins of a histogram can
be customized with `n.breaks.histogram`

. The default setting, `NA`

,
applies a heuristic estimate.

The case of a response two dependent variables ('y~x+z') is covered by either a spine plot (if all are factors) or a heat map.

In many cases with few points for one of the aggregate plots, a scatter
looks better (options `min.points.density`

, `min.points.violin`

,
`min.points.hex`

).

If each factor combination occurs only once in the data set, we resort to bar plots.

Conditional variables are represented by either
trying to fit into the same graph using coloring (`max.factor.levels.color`

),
or by facetting (preferred dimensions `facet.num.wrap`

(resp.
`facet.num.grid`

) for one resp. two variables). Numeric vectors are
discretized accordingly. Facets are laid out horizontally or vertically
according to the plot type, up to maximum dimensions of `facet.max.rows`

and `facet.max.cols`

.

To better illustrate the relation between an independent factor variable and a dependent numerical variable (or an ordered factor), levels are reordered according to the value of the dependent variable. If no other numeric or ordered variable exists, we sort by frequency.

Argument `weights`

allows to specify weights
or frequency counts for each row of data. All plots and summary statistics
take weights into account when supplied. In scatter and heat maps, weights
are indicated either by a shaded disk with proportional area (default) or by
jittering (option `dedupe.scatter`

), if the number of duplicated points
exceeds `min.points.jitter`

. The amount of jittering can be controlled
with `jitter.x`

and `jitter.y`

.

`plotluck`

supports logarithmic and log-modulus
axis scaling. log-modulus is considered if values are both positive and
negative; in this case, the transform function is ```
f(x) = sign(x) *
log(1+abs(x))
```

.

The heuristic to apply scaling is based on the proportion of total display
range that is occupied by the 'core' region of the distribution between the
lower and upper quartiles; namely, the fact whether the transform could
magnify this region by a factor of at least `trans.log.thresh`

.

By default, missing (`NA`

or `NaN`

) values
in factors are are shown as a special factor level code"?". They can be
removed by setting `na.rm=TRUE`

. Conventionally, missing numeric values
are not shown.

For very large data sets, plots can take a very long time
(or even crash R). `plotluck`

has a built-in stop-gap: If the data
comprises more than `sample.max.rows`

, it will be sampled down to that
size (taking into account `weights`

, if supplied).

Character (resp. logical) vectors are converted to unordered (resp. ordered) factors.

Frequently, when numeric variables have very few values despite sufficient
data size, it helps to treat these values as the levels of a factor; this is
governed by option `few.unique.as.factor`

.

If an unordered factor has too many levels, plots can get messy. In this
case, only the `max.factor.levels`

most frequent ones are retained,
while the rest are merged into a default level `".other."`

.

If `color`

or `fill`

aesthetics are used to
distinguish different levels or ranges of a variable, the color scheme adjusts
to the type. Preferably, a sequential (resp. qualitative) palette is chosen
for a numeric/ordered (unordered) factor (`palette.brewer.seq`

,
`palette.brewer.qual`

); see also RColorBrewer.

If `formula`

contains a dot
(`"."`

) symbol, the function creates a number of 1D or 2D plots by calling
`plotluck`

repeatedly. As described above, this allows either single
distribution, one-vs-all and all-vs-all variable plots. To save space,
rendering is minimal without axis labels.

In the all-vs-all case, the diagonal contains 1D distribution plots, analogous
to the behavior of the default plot method for data frames, see
`plot.data.frame`

.

With setting `in.grid=FALSE`

, plots are produced in a sequence, otherwise
together on one or multiple pages, if necessary (default). Page size is
controlled by `multi.max.rows`

and `multi.max.cols`

.

With `entropy.order=TRUE`

, plots are sorted by an estimate of
empirical conditional entropy, with the goal of prioritizing the more
predictive variables. Set `verbose=TRUE`

if you want to see the actual
values. For large data sets the calculation can be time consuming; entropy
calculation can be suppressed by setting `multi.entropy.order=FALSE`

.

@note The return value is an object of class `plotluck_multi`

. This
class does not have any functionality; its sole purpose is to make this
function work in the same way as `ggplot`

and `plotluck`

, namely,
do the actual drawing if and only if the return value is not assigned.

With the option `verbose=TRUE`

turned on, the function
will print out information about the chosen and applicable plot types, ordering,
log scaling, etc.

Variable names can be abbreviated if they match a column name uniquely by prefix.

By default, `plotluck`

uses violin and density plots in place of the more traditional box-and-whisker
plots and histograms; these modern graph types convey the shape of a
distribution better. In the former case, summary statistics like mean and
quantiles are less useful if the distribution is not unimodal; a wrong
choice of the number of bins of a histogram can create misleading artifacts.

Following Cleveland's advice, factors are plotted on the y-axis to make labels
most readable and compact at the same time. This direction can be controlled
using option `prefer.factors.vert`

.

Due to their well-documented problematic aspects, pie charts and stacked bar graphs are not supported.

With real-world data (as opposed to smooth mathematical functions), three-dimensional scatter, surface, or contour plots can often be hard to read if the shape of the distribution is not suitable, data coverage is uneven, or if the perspective is not carefully chosen depending on the data. Since they usually require manual tweaking, we have refrained from incorporating them.

For completeness, we have included the description of option parameters in the current help page. However, the tenet of this function is to be usable "out-of-the-box", with no or very little manual tweaking required. If you find yourself needing to change option values repeatedly or find the presets to be suboptimal, please contact the author.

`plotluck`

is designed for generic out-of-the-box
plotting, and not suitable to produce more specialized types of plots that
arise in specific application domains (e.g., association, stem-and-leaf,
star plots, geographic maps, etc). It is restricted to at most three variables.
Parallel plots with variables on different scales (such as time
series of multiple related signals) are not supported.

`plotluck.options`

, `sample.plotluck`

, `ggplot`

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ```
# Single-variable density
data(diamonds, package='ggplot2')
plotluck(diamonds, price~1)
invisible(readline(prompt="Press [enter] to continue"))
# Violin plot
data(iris)
plotluck(iris, Species~Petal.Length)
invisible(readline(prompt="Press [enter] to continue"))
# Scatter plot
data(mpg, package='ggplot2')
plotluck(mpg, cty~model)
invisible(readline(prompt="Press [enter] to continue"))
# Spine plot
data(Titanic)
plotluck(as.data.frame(Titanic), Survived~Class+Sex, weights=Freq)
invisible(readline(prompt="Press [enter] to continue"))
# Facetting
data(msleep, package='ggplot2')
plotluck(msleep, sleep_total~bodywt|vore)
invisible(readline(prompt="Press [enter] to continue"))
# Heat map
plotluck(diamonds, price~cut+color)
# Multi plots
# All 1D distributions
plotluck(iris, .~1)
# 2D dependencies with one fixed variable on vertical axis
plotluck(iris, Species~.)
# See also tests/testthat/test_plotluck.R for more examples!
``` |

plotluck documentation built on June 27, 2019, 5:07 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.