MoE_gpairs: Generalised Pairs Plots for MoEClust Mixture Models
In MoEClust: Gaussian Parsimonious Clustering Models with Covariates and a Noise Component

MoE_gpairs

R Documentation

Generalised Pairs Plots for MoEClust Mixture Models

Description

Produces a matrix of plots showing pairwise relationships between continuous response variables and continuous/categorical/logical/ordinal associated covariates, as well as the clustering achieved, according to fitted MoEClust mixture models.

Usage

MoE_gpairs(res,
           response.type = c("points", "uncertainty", "density"),
           subset = list(...),
           scatter.type = c("lm", "points"),
           conditional = c("stripplot", "boxplot"),
           addEllipses = c("outer", "yes", "no", "inner", "both"),
           expert.covar = TRUE,
           border.col = c("purple", "black", "brown", "brown", "navy"),
           bg.col = c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk"),
           outer.margins = list(bottom = grid::unit(2, "lines"),
                                left = grid::unit(2, "lines"),
                                top = grid::unit(2, "lines"),
                                right = grid::unit(2, "lines")),
           outer.labels = NULL,
           outer.rot = c(0, 90),
           gap = 0.05,
           buffer = 0.025,
           uncert.cov = FALSE,
           scatter.pars = list(...),
           density.pars = list(...),
           diag.pars = list(...),
           stripplot.pars = list(...),
           boxplot.pars = list(...),
           barcode.pars = list(...),
           mosaic.pars = list(...),
           axis.pars = list(...),
           ...)

Arguments

`res`	An object of class `"MoEClust"` generated by `MoE_clust`, or an object of class `"MoECompare"` generated by `MoE_compare`. Models with a noise component are facilitated here too.
`response.type`	The type of plot desired for the scatterplots comparing continuous response variables. Defaults to `"points"`. See `scatter.pars` below. Points can also be sized according to their associated clustering uncertainty with the option `"uncertainty"`. In doing so, the transparency of the points will also be proportional to their clustering uncertainty, provided the device supports transparency. See also `MoE_Uncertainty` for an alternative means of visualising observation-specific cluster uncertainties (especially for univariate data). See `scatter.pars` below, and note that models fitted via the `"CEM"` algorithm will have no associated clustering uncertainty. Alternatively, the bivariate parametric `"density"` contours can be displayed (see `density.pars`), provided there is at least one Gaussian component in the model. Caution is advised when producing density plots for models with covariates in the expert network; the required number of evaluations of the (multivariate) Gaussian density for each panel (`res$G * prod(density.pars$grid.size)`) increases by a factor of `res$n`, thus plotting may be slow (particularly for large data sets). However, this is offset somewhat by using pre-calculated densities from the corresponding upper-triangular panels when producing the lower-triangular panels. See `density.pars` below.
`subset`	A list giving named arguments for producing only a subset of panels: `show.map` Logical indicating whether to show panels involving the MAP classification (defaults to `TRUE`, unless there is only one component, in which case the MAP classification is never plotted.). `data.ind` For subsetting response variables: a vector of column indices corresponding to the variables in the columns of `res$data` which should be shown. Defaults to all. Can be `0`, in order to suppress plotting the response variables. Alternatively, character strings matching the column names of `res$data` can be supplied here. `cov.ind` For subsetting covariates: a vector of column indices corresponding to the covariates in the columns `res$net.covs` which should be shown. Defaults to all. Can be `0`, in order to suppress plotting the covariates. Alternatively, character strings matching the column names of `res$net.covs` can be supplied here. `submat` Can take the values `"all"` (default), `"upper"`, `"lower"`, or `"diagonal"`, for displaying all panels or only the upper/lower triangular panels or diagonal (marginal) panels of the plot matrix. The results of the subsetting must ensure that at least one panel of some sort can be plotted. The arguments `data.ind` and `cov.ind` can also be used to simply reorder the panels, without actually subsetting. Diagonal panels are always drawn, regardless of the value of `submat` (but can be somewhat suppressed using `diag.pars$show.hist=FALSE` and `diag.pars$show.dens=FALSE`; see `diag.pars` below). When `diag.pars$diagonal=TRUE` (the default), the triangular portions are the `"upper"`-right and `"lower"`-left, whereas they are the `"upper"`-left and `"lower"`-right when `diag.pars$diagonal=FALSE`. Generally, `submat="upper"` should be preferable to `submat="lower"`, as it ensures that response variables and covariates are displayed as appropriate on the y-axes and x-axes, respectively.
`scatter.type`	A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, pertaining to the associated covariates. Defaults to `"lm"` for covariate vs. response panels and `"points"` otherwise. Only relevant for models with continuous covariates in the gating &/or expert network. `"ci"` and `"lm"` type plots are only produced for plots pairing covariates with response, and never response vs. response or covariate vs. covariate. Note that lines &/or confidence intervals will only be drawn for continuous covariates included in the expert network; to include covariates included only in the gating network also, the options `"lm2"` or `"ci2"` can be used but this is not generally advisable. See `scatter.pars` below.
`conditional`	A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, for plots involving a mix of categorical and continuous variables. Defaults to `"stripplot"` in the upper triangle and `"boxplot"` in the lower triangle (see `panel.stripplot` and `panel.bwplot`). `"violin"` and `"barcode"` plots can also be produced. Only relevant for models with categorical covariates in the gating &/or expert network, unless `show.MAP` is `TRUE`. Comparisons of two categorical variables (which can only ever be covariates or the MAP classification) are always displayed via mosaic plots (see `strucplot`). All `conditional` panel types can be customised further; see `stripplot.pars`, `boxplot.pars` (for both `"boxplot"` and `"violin"` plots), `barcode.pars`, and `mosaic.pars` below. Note that when `conditional` is of length 1, that plot type will be used in both the upper and lower triangular portions of the plot, where relevant.
`addEllipses`	Controls whether to add MVN ellipses with axes corresponding to the within-cluster covariances for the response data. The options `"inner"` and `"outer"` (the default) will colour the axes or the perimeter of those ellipses, respectively, according to the cluster they represent (according to `scatter.pars$eci.col`). The option `"both"` will obviously colour both the axes and the perimeter. The `"yes"` or `"no"` options merely govern whether the ellipses are drawn, i.e. `"yes"` draws ellipses without any colouring. Ellipses are only ever drawn for multivariate data, and only when `response.type` is `"points"` or `"uncertainty"`. Ellipses are centered on the posterior mean of the fitted values when there are expert network covariates, otherwise on the posterior mean of the response variables. In the presence of expert network covariates, the component-specific covariance matrices are also (by default, via the argument `expert.covar` below) modified for plotting purposes via the function `expert_covar`, in order to account for the extra variability of the means, usually resulting in bigger shapes & sizes for the MVN ellipses.
`expert.covar`	Logical (defaults to `TRUE`) governing whether the extra variability in the component means is added to the MVN ellipses corresponding to the component covariance matrices in the presence of expert network covariates. See the function `expert_covar`. Only relevant when `response.type` is `"points"` or `"uncertainty"` when `addEllipses` is invoked accordingly, and only relevant for models with expert network covariates and multivariate responses.
`border.col`	A vector of length 5 (or 1) containing border colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively. Defaults to `c("purple", "black", "brown", "brown", "navy")`.
`bg.col`	A vector of length 5 (or 1) containing background colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively. Defaults to `c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk")`.
`outer.margins`	A list of length 4 with units as components named `bottom`, `left`, `top`, and `right`, giving the outer margins; the defaults uses two lines of text. A vector of length 4 with units (ordered properly) will work, as will a vector of length 4 with numeric variables (interpreted as lines). May need to be increased to accommodate outer labels in some cases.
`outer.labels`	The default is typically `NULL`, for alternating labels around the perimeter. If `"all"`, all labels are printed, and if `"none"`, no labels are printed. If `subset$submat="upper"` or `subset$submat="lower"`, `outer.labels` instead defaults to `"all"`. Note that axis labels always correspond to the range of the depicted variable, and thus should not be interpreted as indicating counts or densities for the diagonal panels when `diag.pars$show.hist=TRUE` &/or `diag.pars$show.dens=TRUE`.
`outer.rot`	A 2-vector (`x`, `y`) rotating the top/bottom outer labels `x` degrees and the left/right outer labels `y` degrees. Only works for categorical labels of boxplot, mosaic, strip plot, and violin plot panels. Defaults to `c(0, 90)`. Reordering via `data.ind` or `cov.ind` may improve appearance of outer labels in some cases.
`gap`	The gap between the tiles; defaulting to `0.05` of the width of a tile.
`buffer`	The fraction by which to expand the range of quantitative variables to provide plots that will not truncate plotting symbols. Defaults to `0.025`, i.e. 2.5 percent of the range. Particularly useful when ellipses are drawn (see `addEllipses`) to ensure ellipses are visible in full.
`uncert.cov`	A logical indicating whether the expansion factor for points on plots involving covariates should also be modified when `response.type="uncertainty"`. Defaults to `FALSE`, and only relevant for scatterplot and strip plot panels. When `TRUE`, `scatter.pars$uncert.pch` is invoked as the plotting symbols for covariate-related scatterplot and strip plot panels, otherwise `scatter.pars$scat.pch` and `stripplot.pars$strip.pch` are invoked for such panels.
`scatter.pars`	A list supplying select parameters for the continuous vs. continuous scatterplots. `NULL` is equivalent to: list(scat.pch=res$classification, uncert.pch=19, scat.col=res$classification, scat.size=unit(0.25, "char"), eci.col=1:res$G, noise.size=unit(0.2, "char")), where `scat.pch`, `scat.col`, and `scat.size` give the plotting symbols, colours, and sizes of the points in scatterplot panels, respectively. Note that `eci.col` gives both a) the colour of the fitted lines &/or confidence intervals for expert-related panels when `scatter.type` is one of `"ci"` or `"lm"` and b) the colour of the ellipses (if any) when `addEllipses` is one of `"outer"`, `"inner"`, or `"both"` and the response data is multivariate. Note that `eci.col` will inherit a suitable default from `scat.col` instead if the latter is supplied but the former is not. Note also that `scat.size` will be modified on an observation-by-observation level when `response.type` is `"uncertainty"`. Furthermore, note that the behaviour for plotting symbols when `response.type="uncertainty"` changes compared to `response.type="points"` depending on the value of the `uncert.cov` argument above. `uncert.pch` gives the plotting symbol used for all scatterplot (and strip plot) panels when `response.type="uncertainty"` and `uncert.cov` is `TRUE`. However, when `uncert.cov` is `FALSE`, `scat.pch` is invoked for scatterplots involving covariates and `uncert.pch` is used for panels involving only response variables. Finally, `noise.size` can be used to modify `scat.size` for observations assigned to the noise component (if any), but only when `response.type="points"`.
`density.pars`	A list supplying select parameters for visualising the bivariate parametric density contours, only when `response.type` is `"density"`. `NULL` is equivalent to: list(grid.size=c(100, 100), dcol="grey50", dens.points=FALSE, nlevels=11, show.labels=!dens.points, label.style="mixed"), where `grid.size` is a vector of length two giving the number of points in the x & y directions of the grid over which the density is evaluated, respectively (though `density.pars$grid.size` can also be supplied as a scalar, which will be automatically recycled to a vector of length 2), and `dcol` is either a single colour or a vector of length `nlevels` colours. `dens.points` governs whether points should be overlaid when `response.type="density"` (in other words, `dens.points=TRUE` is akin to specifying `response.type="points"` and `response.type="density"` simultaneously) and `show.labels` governs whether the density contours should be labelled. Note that contours are not labelled when `dens.points=TRUE` by default. Finally, `label.style` can take the values `"mixed"`, `"flat"`, or `"align"`.
`diag.pars`	A list supplying select parameters for panels along the diagonal. `NULL` is equivalent to: list(diag.fontsize=9, diagonal=TRUE, hist.color=hist.color, show.hist=TRUE, show.counts=TRUE, show.dens=FALSE, dens.grid=100), where `hist.color` is a vector of length 4, giving the colours for the response variables, gating covariates, expert covariates, and covariates entering both networks, respectively. By default, diagonal panels for response variables are `ifelse(diag.pars$show.dens, "white", "black")` and covariates of any kind are `"dimgrey"`. `hist.color` also governs the outer colour for mosaic panels and the fill colour for boxplot, and violin panels (except for those involving the MAP classification; see `boxplot.pars` below). However, in the case of response vs. (categorical) covariates boxplots and violin plots, the fill colour is always `"white"`. The MAP classification is always coloured by cluster membership, by default. The argument `show.counts` is only relevant for categorical variables. The argument `show.dens` toggles whether parametric density estimates are drawn over the diagonal panels for each response variable. When `show.dens=TRUE`, the component densities are shown via thin lines, with colours given by `scatter.pars$scat.col`, while a thick `"black"` line is used for the overall mixture density. This argument can be used with or without `show.hist` also being `TRUE`. Finally, the grid size when `show.dens=TRUE` is given by `diag.grid=100` by default. As per `response.type="density"`, plotting is liable to be a little slower when `show.dens=TRUE` for models with expert network covariates. This is why `show.dens=FALSE` by default; otherwise it is recommended to be set to `TRUE`. When `diagonal=TRUE` (the default), the diagonal from the top left to the bottom right is used for displaying the marginal distributions of variables (via histograms, with or without overlaid density estimates, or barplots, as appropriate). Specifying `diagonal=FALSE` will place the diagonal running from the top right down to the bottom left (with `subset$submat` accounted for accordingly).
`stripplot.pars`	A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of `conditional` is `"stripplot"`. `NULL` is equivalent to: list(strip.pch=res$classification, strip.size=unit(0.5, "char"), strip.col=res$classification, jitter=TRUE, size.noise=unit(0.4, "char")), where `strip.size` and `size.noise` retain the definitions for the similar arguments under `scatter.pars` above. However, `stripplot.pars$size.noise` is invoked regardless of the `response.type` (in contrast to `scatter.pars$noise.size`). Notably, `strip.col` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not. Note also that the `strip.pch` default is modified to `scatter.pars$uncert.pch` if `uncert.cov` is `TRUE`.
`boxplot.pars`	A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of `conditional` is `"boxplot"` or `"violin"`. `NULL` is equivalent to: list(box.pch="\|", box.col="black", varwidth=FALSE, notch=FALSE, notch.frac=0.5, box.fill=1:res$G). All of the above are relevant for `"boxplot"` panels, are passed to `panel.bwplot` when producing boxplots, and retain the same definitions as the similarly named arguments therein. However, only `box.col`, `varwidth`, and `box.fill` are relevant for `"violin"` panels, and in both cases `box.fill` is only invoked for panels where the categorical variable is the MAP classification (i.e. when `subset$show.map=TRUE`). See `diag.pars$hist.color` for controlling the colours of non-MAP-related boxplot/violin panels. Notably, `box.fill` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not.
`barcode.pars`	A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of `conditional` is `"barcode"`. `NULL` is equivalent to: list(bar.col=res$G:1, nint=0, ptsize=scatter.pars$scat.size, ptpch=scatter.pars$scat.pch, bcspace=NULL, use.points=FALSE), where `bar.col` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not. See the help file for `barcode::barcode` for details on the remaining arguments. Note that the arguments `ptsize` and `ptpch`, which are only relevant when `use.points=TRUE` are given by the corresponding `scatter.pars$scat.size`/`scatter.pars$noise.size` and `scatter.pars$scat.pch` arguments, by default.
`mosaic.pars`	A list supplying select parameters for categorical vs. categorical panels (if any). `NULL` is equivalent to: list(shade=NULL, gp_labels=grid::gpar(fontsize=9), gp_args=list(), gp=list(), mfill=TRUE, mcol=1:res$G). The current default arguments and values thereof are passed through to `strucplot` for producing mosaic tiles. When `shade` is not `FALSE`, `mfill` is a logical which governs the colouring scheme for panels (if any) involving the MAP classification. When `mfill` is `TRUE` (the default), `gp` is invoked here in such a way that tiles will inherit appropriate interior colours via `gp$fill` from `mcol` and a `"black"` outer colour via `gp$col`. When `mfill` is `FALSE`, or the panel involves two categorical covariates, the outer colours are inherited from `mcol` and the interior fill colour is inherited from `bg.col`. See `diag.pars$hist.color` for controlling the interior fill colour of non-MAP-related mosaic panels. Notably, `mcol` will inherit a suitable default from `scatter.pars$scat.col` if the latter is supplied but the former is not.
`axis.pars`	A list supplying select parameters for controlling the axes. `NULL` is equivalent to: list(n.ticks=5, axis.fontsize=9). The argument `n.ticks` will be overwritten for categorical variables with fewer than 5 levels.
`...`	Catches unused arguments. Alternatively, named arguments can be passed directly here to any/all of `scatter.pars`, `density.pars`, `diag.pars`, `stripplot.pars`, `boxplot.pars`, `barcode.pars`, `mosaic.pars`, and `axis.pars`.

Value

A generalised pairs plot showing all pairwise relationships between clustered response variables and associated gating &/or expert network continuous &/or categorical variables, coloured according to the MAP classification, with the marginal distributions of each variable along the diagonal.

Note

plot.MoEClust is a wrapper to MoE_gpairs which accepts the default arguments, and also produces other types of plots. Caution is advised producing generalised pairs plots when the dimension of the data is large.

Note that all colour-related defaults in scatter.pars, stripplot.pars, barcode.pars, and mosaic.pars above assume a specific colour-palette (see mclust.options("classPlotColors")). Thus, for instance, specifying scatter.pars$scat.col=res$classification will produce different results compared to leaving this argument unspecified. This is especially true for models with a noise component, for which the default is handled quite differently (for one thing, res$G is the number of non-noise components). Similarly, all pch-related defaults in scatter.pars and stripplot.pars above assume a specific set of plotting symbols also (see mclust.options("classPlotSymbols")). Generally, all colour and symbol related arguments are strongly recommended to be left at their default values, unless being supplied as a single character string, e.g. "black" for colours. To help in this regard, colour-related arguments sensibly inherent their defaults from scatter.pars$scat.col if that is supplied and the argument in question is not.

Warning

For MoEClust models with more than one expert network covariate, fitted lines produced in continuous covariate vs. continuous response scatterplots via scatter.type="lm" or scatter.type="ci" will NOT correspond to the coefficients in the expert network (res$expert).

Caution is advised when producing "barcode" plots for the conditional panels. In some cases, resizing the graphics device after the production of the plot will result in distortion because of the way the rotation of non-horizontal barcodes is performed. Thus, when any(conditional == "barcode"), it is advisable to ensure the dimensions of the overall plot are square. Furthermore, such plots may not display correctly anyway in RStudio's “Plots” pane and so a different graphics device may need to be used (but not subsequently resized).

Caution is also advised when producing generalised pairs plots when the dimension of the data is large.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11634-019-00373-8")}>.

Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H. and Wickham, H. (2013). The generalized pairs plot. Journal of Computational and Graphical Statistics, 22(1): 79-91.

Examples

data(ais)
res   <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                   network.data=ais, modelNames="EVE")
MoE_gpairs(res)

# Produce the same plot, but with a violin plot in the lower triangle.
# Colour the outline of the mosaic tiles rather than the interior using mfill.
# Size points in the response vs. response panels by their clustering uncertainty.

MoE_gpairs(res, conditional=c("stripplot", "violin"),
           mfill=FALSE, response.type="uncertainty")

# Instead show the bivariate density contours of the response variables (without labels).
# (Plotting may be slow when response.type="density" for models with expert covariates.)
# Use different colours for histograms of covariates in the gating/expert/both networks.
# Also use different colours for response vs. covariate & covariate vs. response panels.

MoE_gpairs(res, response.type="density", show.labels=FALSE, dens.points=TRUE,
           hist.color=c("black", "cyan", "hotpink", "chartreuse"),
           bg.col=c("whitesmoke", "white", "mintcream", "mintcream", "floralwhite"))
           
# Examine effect of expert.covar & diag.grid in conjunction with show.dens & show.hist
MoE_gpairs(res, show.dens=TRUE, expert.covar=FALSE, show.hist=FALSE, diag.grid=20)
MoE_gpairs(res, show.dens=TRUE, expert.covar=TRUE, show.hist=TRUE, diag.grid=200)
           
# Explore various options to subset and rearrange the panels
MoE_gpairs(res, data.ind=5:1, cov.ind=0, 
           show.map=FALSE, show.hist=FALSE, 
           submat="upper", diagonal=FALSE)          
           
# Produce a generalised pairs plot for a model with a noise component.
# Reorder the covariates and omit the variables "Hc" and "Hg".
# Use barcode plots for the categorical/continuous pairs.
# Magnify the size of scatter points assigned to the noise component.

resN  <- MoE_clust(ais[,3:7], G=2, gating= ~ SSF + Ht, expert= ~ sex,
                   network.data=ais, modelNames="EEE", tau0=0.1, noise.gate=FALSE)
                   
# Note that non-horizontal barcode panels may not display correctly in RStudio's "Plots" pane 
# it may be necessary to first open a new device:
# dev.new()
MoE_gpairs(resN, data.ind=c(1,2,5), cov.ind=c(3,1,2), use.points=TRUE,
           conditional="barcode", noise.size=grid::unit(0.5, "char"))
           
# Plots can be modified to show only a single (diagonal) panel of interest
MoE_gpairs(resN, data.ind=0, cov.ind=0)
MoE_gpairs(resN, data.ind=0, cov.ind="sex", show.map=FALSE)
MoE_gpairs(resN, data.ind="RCC", cov.ind=0, show.map=FALSE, show.dens=TRUE)

MoEClust documentation built on April 3, 2025, 11:07 p.m.