MoE_gpairs: Generalised Pairs Plots for MoEClust Mixture Models

View source: R/Plotting_Functions.R

MoE_gpairsR Documentation

Generalised Pairs Plots for MoEClust Mixture Models

Description

Produces a matrix of plots showing pairwise relationships between continuous response variables and continuous/categorical/logical/ordinal associated covariates, as well as the clustering achieved, according to fitted MoEClust mixture models.

Usage

MoE_gpairs(res,
           response.type = c("points", "uncertainty", "density"),
           subset = list(...),
           scatter.type = c("lm", "points"),
           conditional = c("stripplot", "boxplot"),
           addEllipses = c("outer", "yes", "no", "inner", "both"),
           expert.covar = TRUE,
           border.col = c("purple", "black", "brown", "brown", "navy"),
           bg.col = c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk"),
           outer.margins = list(bottom = grid::unit(2, "lines"),
                                left = grid::unit(2, "lines"),
                                top = grid::unit(2, "lines"),
                                right = grid::unit(2, "lines")),
           outer.labels = NULL,
           outer.rot = c(0, 90),
           gap = 0.05,
           buffer = 0.025,
           uncert.cov = FALSE,
           scatter.pars = list(...),
           density.pars = list(...),
           stripplot.pars = list(...),
           boxplot.pars = list(...),
           barcode.pars = list(...),
           mosaic.pars = list(...),
           axis.pars = list(...),
           diag.pars = list(...),
           ...)

Arguments

res

An object of class "MoEClust" generated by MoE_clust, or an object of class "MoECompare" generated by MoE_compare. Models with a noise component are facilitated here too.

response.type

The type of plot desired for the scatterplots comparing continuous response variables. Defaults to "points". See scatter.pars below.

Points can also be sized according to their associated clustering uncertainty with the option "uncertainty". In doing so, the transparency of the points will also be proportional to their clustering uncertainty, provided the device supports transparency. See also MoE_Uncertainty for an alternative means of visualising observation-specific cluster uncertainties (especially for univariate data). See scatter.pars below, and note that models fitted via the "CEM" algorithm will have no associated clustering uncertainty.

Alternatively, the bivariate "density" contours can be displayed (see density.pars), provided there is at least one Gaussian component in the model. Caution is advised when producing density plots for models with covariates in the expert network; the required number of evaluations of the (multivariate) Gaussian density for each panel (res$G * prod(density.pars$grid.size)) increases by a factor of res$n, thus plotting may be slow (particularly for large data sets). See density.pars below.

subset

A list giving named arguments for producing only a subset of panels:

show.map

Logical indicating whether to show panels involving the MAP classification (defaults to TRUE, unless there is only one component, in which case the MAP classification is never plotted.).

data.ind

For subsetting response variables: a vector of column indices corresponding to the variables in the columns of res$data which should be shown. Defaults to all. Can be 0, in order to suppress plotting the response variables.

cov.ind

For subsetting covariates: a vector of column indices corresponding to the covariates in the columns res$net.covs which should be shown. Defaults to all. Can be 0, in order to suppress plotting the covariates.

The result of the subsetting must include at least two variables, whether they be the MAP classification, a response variable, or a covariate, in order to be valid for plotting purposes. The arguments data.ind and cov.ind can also be used to simply reorder the panels, without actually subsetting.

scatter.type

A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, pertaining to the associated covariates. Defaults to "lm" for covariate vs. response panels and "points" otherwise. Only relevant for models with continuous covariates in the gating &/or expert network. "ci" and "lm" type plots are only produced for plots pairing covariates with response, and never response vs. response or covariate vs. covariate. Note that lines &/or confidence intervals will only be drawn for continuous covariates included in the expert network; to include covariates included only in the gating network also, the options "lm2" or "ci2" can be used but this is not generally advisable. See scatter.pars below.

conditional

A vector of length 2 (or 1) giving the plot type for the upper and lower triangular portions of the plot, respectively, for plots involving a mix of categorical and continuous variables. Defaults to "stripplot" in the upper triangle and "boxplot" in the lower triangle (see panel.stripplot and panel.bwplot). "violin" and "barcode" plots can also be produced. Only relevant for models with categorical covariates in the gating &/or expert network, unless show.MAP is TRUE. Comparisons of two categorical variables (which can only ever be covariates or the MAP classification) are always displayed via mosaic plots (see strucplot).

All conditional panel types can be customised further; see stripplot.pars, boxplot.pars (for both "boxplot" and "violin" plots), barcode.pars, and mosaic.pars below. Note that when conditional is of length 1, that plot type will be used in both the upper and lower triangular portions of the plot, where relevant.

addEllipses

Controls whether to add MVN ellipses with axes corresponding to the within-cluster covariances for the response data. The options "inner" and "outer" (the default) will colour the axes or the perimeter of those ellipses, respectively, according to the cluster they represent (according to scatter.pars$eci.col). The option "both" will obviously colour both the axes and the perimeter. The "yes" or "no" options merely govern whether the ellipses are drawn, i.e. "yes" draws ellipses without any colouring. Ellipses are only ever drawn for multivariate data, and only when response.type is "points" or "uncertainty".

Ellipses are centered on the posterior mean of the fitted values when there are expert network covariates, otherwise on the posterior mean of the response variables. In the presence of expert network covariates, the component-specific covariance matrices are also (by default, via the argument expert.covar below) modified for plotting purposes via the function expert_covar, in order to account for the extra variability of the means, usually resulting in bigger shapes & sizes for the MVN ellipses.

expert.covar

Logical (defaults to TRUE) governing whether the extra variability in the component means is added to the MVN ellipses corresponding to the component covariance matrices in the presence of expert network covariates. See the function expert_covar. Only relevant when response.type is "points" or "uncertainty" when addEllipses is invoked accordingly, and/or diag.pars$show.dens=TRUE (see below), and only relevant for models with expert network covariates.

border.col

A vector of length 5 (or 1) containing border colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively.

Defaults to c("purple", "black", "brown", "brown", "navy").

bg.col

A vector of length 5 (or 1) containing background colours for plots against the MAP classification, response vs. response, covariate vs. response, response vs. covariate, and covariate vs. covariate panels, respectively.

Defaults to c("cornsilk", "white", "palegoldenrod", "palegoldenrod", "cornsilk").

outer.margins

A list of length 4 with units as components named bottom, left, top, and right, giving the outer margins; the defaults uses two lines of text. A vector of length 4 with units (ordered properly) will work, as will a vector of length 4 with numeric variables (interpreted as lines).

outer.labels

The default is NULL, for alternating labels around the perimeter. If "all", all labels are printed, and if "none", no labels are printed.

outer.rot

A 2-vector (x, y) rotating the top/bottom outer labels x degrees and the left/right outer labels y degrees. Only works for categorical labels of boxplot and mosaic panels. Defaults to c(0, 90).

gap

The gap between the tiles; defaulting to 0.05 of the width of a tile.

buffer

The fraction by which to expand the range of quantitative variables to provide plots that will not truncate plotting symbols. Defaults to 0.025, i.e. 2.5 percent of the range. Particularly useful when ellipses are drawn (see addEllipses) to ensure ellipses are visible in full.

uncert.cov

A logical indicating whether the expansion factor for points on plots involving covariates should also be modified when response.type="uncertainty". Defaults to FALSE, and only relevant for scatterplot and strip plot panels. When TRUE, scatter.pars$uncert.pch is invoked as the plotting symbols for covariate-related scatterplot and strip plot panels, otherwise scatter.pars$scat.pch and stripplot.pars$strip.pch is invoked for such panels.

scatter.pars

A list supplying select parameters for the continuous vs. continuous scatterplots.

NULL is equivalent to:

list(scat.pch=res$classification, uncert.pch=19,
     scat.col=res$classification, scat.size=unit(0.25, "char"), 
     eci.col=1:res$G, noise.size=unit(0.2, "char")),

where scat.pch, scat.col, and scat.size give the plotting symbols, colours, and sizes of the points in scatterplot panels, respectively. Note that eci.col gives both a) the colour of the fitted lines &/or confidence intervals for expert-related panels when scatter.type is one of "ci" or "lm" and b) the colour of the ellipses (if any) when addEllipses is one of "outer", "inner", or "both" and the response data is multivariate. Note that eci.col will inherit a suitable default from scat.col instead if the latter is supplied but the former is not.

Note also that scat.size will be modified on an observation-by-observation level when response.type is "uncertainty". Furthermore, note that the behaviour for plotting symbols when response.type="uncertainty" changes compared to response.type="points" depending on the value of the uncert.cov argument above. uncert.pch gives the plotting symbol used for all scatterplot (and strip plot) panels when response.type="uncertainty" and uncert.cov is TRUE. However, when uncert.cov is FALSE, scat.pch is invoked for scatterplots involving covariates and uncert.pch is used for panels involving only response variables. Finally, noise.size can be used to modify scat.size for observations assigned to the noise component (if any), but only when response.type="points".

density.pars

A list supplying select parameters for visualising the bivariate density contours, only when response.type is "density".

NULL is equivalent to:

list(grid.size=c(100, 100), dcol="grey50",
     nlevels=11, show.labels=TRUE, label.style="mixed"),

where grid.size is a vector of length two giving the number of points in the x & y direction of the grid over which the density is evaluated, respectively (though density.pars$grid.size can also be supplied as a scalar, which will be automatically recycled to a vector of length 2), and dcol is either a single colour or a vector of length nlevels colours (although note that dcol, when not specified, will be adjusted for transparency). Finally, label.style can take the values "mixed", "flat", or "align". Note that density.pars$grid.size[1] is also relevant when diag.pars$show.dens=TRUE (see below).

stripplot.pars

A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of conditional is "stripplot".

NULL is equivalent to:

list(strip.pch=res$classification, strip.size=unit(0.5, "char"),
     strip.col=res$classification, jitter=TRUE, size.noise=unit(0.4, "char")),

where strip.size and size.noise retain the definitions for the similar arguments under scatter.pars above. However, stripplot.pars$size.noise is invoked regardless of the response.type (in contrast to scatter.pars$noise.size). Notably, strip.col will inherit a suitable default from scatter.pars$scat.col if the latter is supplied but the former is not. Note also that the strip.pch default is modified to scatter.pars$uncert.pch if uncert.cov is TRUE.

boxplot.pars

A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of conditional is "boxplot" or "violin".

NULL is equivalent to:

list(box.pch="|", box.col="black", varwidth=FALSE,
     notch=FALSE, notch.frac=0.5, box.fill=1:res$G).

All of the above are relevant for "boxplot" panels, are passed to panel.bwplot when producing boxplots, and retain the same definitions as the similarly named arguments therein. However, only box.col, varwidth, and box.fill are relevant for "violin" panels, and in both cases box.fill is only invoked for panels where the categorical variable is the MAP classification (i.e. when isTRUE(subset$show.map)). See diag.pars$hist.color for controlling the colours of non-MAP-related boxplot/violin panels. Notably, box.fill will inherit a suitable default from scatter.pars$scat.col if the latter is supplied but the former is not.

barcode.pars

A list supplying select parameters for continuous vs. categorical panels when one or both of the entries of conditional is "barcode". See the help file for barcode::barcode.

NULL is equivalent to:

list(bar.col=res$G:1, nint=0, ptsize=unit(0.25, "char"), 
     ptpch=1, bcspace=NULL, use.points=FALSE),

where bar.col is only invoked for panels where the categorical variable is the MAP classification (i.e. when isTRUE(subset$show.map)) if it is of length greater than 1, otherwise it is used for all relevant panels. See diag.pars$hist.color for controlling the colours of non-MAP-related barcode panels. Notably, bar.col will inherit a suitable default from scatter.pars$scat.col if the latter is supplied but the former is not.

mosaic.pars

A list supplying select parameters for categorical vs. categorical panels (if any).

NULL is equivalent to:

list(shade=NULL, gp_labels=grid::gpar(fontsize=9), 
     gp_args=list(), gp=list(), mfill=TRUE, mcol=1:res$G).

The current default arguments and values thereof are passed through to strucplot for producing mosaic tiles. When shade is not FALSE, mfill is a logical which governs the colouring scheme for panels (if any) involving the MAP classification. When mfill is TRUE (the default), gp is invoked here in such a way that tiles will inherit appropriate interior colours via gp$fill from mcol and a "black" outer colour via gp$col. When mfill is FALSE, or the panel involves two categorical covariates, the outer colours are inherited from mcol and the interior fill colour is inherited from bg.col. See diag.pars$hist.color for controlling the interior fill colour of non-MAP-related mosaic panels. Notably, mcol will inherit a suitable default from scatter.pars$scat.col if the latter is supplied but the former is not.

axis.pars

A list supplying select parameters for controlling the axes.

NULL is equivalent to:

list(n.ticks=5, axis.fontsize=9).

The argument n.ticks will be overwritten for categorical variables with fewer than 5 levels.

diag.pars

A list supplying select parameters for panels along the diagonal.

NULL is equivalent to:

list(diag.fontsize=9, show.hist=TRUE, show.dens=FALSE,
     diagonal=TRUE, hist.color=hist.color, show.counts=TRUE),

where hist.color is a vector of length 4, giving the colours for the response variables, gating covariates, expert covariates, and covariates entering both networks, respectively. By default, diagonal panels for response variables are ifelse(diag.pars$show.dens, "white", "black") and covariates of any kind are "dimgrey". hist.color also governs the outer colour for mosaic panels and the fill colour for boxplot, violin, and barcode panels (except for those involving the MAP classification). However, in the case of response vs. (categorical) covariates boxplots and violin plots, the fill colour is always "white". The MAP classification is always coloured by cluster membership, by default. The argument show.counts is only relevant for categorical variables.

The argument show.dens toggles whether parametric density estimates are drawn over the diagonal panels for each response variable. When show.dens=TRUE, the component densities are shown via thin lines, with colours given by scatter.pars$scat.col, while a thick "black" line is used for the overall mixture density. This argument can be used with or without show.hist also being TRUE, though density curves will appear bigger when show.hist=FALSE. Note that show.dens=TRUE is also affected by the expert.covar argument above. Finally, the grid size when show.dens=TRUE is given by max(res$n, density.pars$grid.size[1]).

When diagonal=TRUE (the default), the diagonal from the top left to the bottom right is used for displaying the marginal distributions of variables (via histograms, with or without overlaid density estimates, or barplots, as appropriate). Specifying diagonal=FALSE will place the diagonal running from the top right down to the bottom left.

...

Catches unused arguments. Alternatively, named arguments can be passed directly here to any/all of scatter.pars, stripplot.pars, boxplot.pars, barcode.pars, mosaic.pars, axis.pars, and diag.pars.

Value

A generalised pairs plot showing all pairwise relationships between clustered response variables and associated gating &/or expert network continuous &/or categorical variables, coloured according to the MAP classification, with the marginal distributions of each variable along the diagonal.

Note

For MoEClust models with more than one expert network covariate, fitted lines produced in continuous covariate vs. continuous response scatterplots via scatter.type="lm" or scatter.type="ci" will NOT correspond to the coefficients in the expert network (res$expert).

plot.MoEClust is a wrapper to MoE_gpairs which accepts the default arguments, and also produces other types of plots. Caution is advised producing generalised pairs plots when the dimension of the data is large.

Finally, note that all colour-related defaults in scatter.pars, stripplot.pars, barcode.pars, and mosaic.pars above assume a specific colour-palette (see mclust.options("classPlotColors")). Thus, for instance, specifying scatter.pars$scat.col=res$classification will produce different results compared to leaving this argument unspecified. This is especially true for models with a noise component, for which the default is handled quite differently (for one thing, res$G is the number of non-noise components). Similarly, all pch-related defaults in scatter.pars and stripplot.pars above assume a specific set of plotting symbols also (see mclust.options("classPlotSymbols")). Generally, all colour and symbol related arguments are strongly recommended to be left at their default values, unless being supplied as a single character string, e.g. "black" for colours. To help in this regard, colour-related arguments sensibly inherent their default from scatter.pars$scat.col if that is supplied and the argument in question is not.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis and Classification, 14(2): 293-325. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s11634-019-00373-8")}>.

Emerson, J. W., Green, W. A., Schloerke, B., Crowley, J., Cook, D., Hofmann, H. and Wickham, H. (2013). The generalized pairs plot. Journal of Computational and Graphical Statistics, 22(1): 79-91.

See Also

MoE_clust, MoE_stepwise, plot.MoEClust, MoE_Uncertainty, expert_covar, panel.stripplot, panel.bwplot, panel.violin, strucplot, mclust.options

Examples

data(ais)
res   <- MoE_clust(ais[,3:7], G=2, gating= ~ BMI, expert= ~ sex,
                   network.data=ais, modelNames="EVE")
MoE_gpairs(res)

# Produce the same plot, but with a violin plot in the lower triangle.
# Colour the outline of the mosaic tiles rather than the interior using mfill.
# Size points in the response vs. response panels by their clustering uncertainty.

MoE_gpairs(res, conditional=c("stripplot", "violin"),
           mfill=FALSE, response.type="uncertainty")

# Instead show the bivariate density contours of the response variables (without labels).
# (Plotting may be slow when response.type="density" for models with expert covariates.)
# Use different colours for histograms of covariates in the gating/expert/both networks.
# Also use different colours for response vs. covariate & covariate vs. response panels.

MoE_gpairs(res, response.type="density", show.labels=FALSE,
           hist.color=c("black", "cyan", "hotpink", "chartreuse"),
           bg.col=c("whitesmoke", "white", "mintcream", "mintcream", "floralwhite"))
           
# Examine the effect of the expert.covar argument in conjunction with show.dens
MoE_gpairs(res, cov.ind=0, expert.covar=TRUE, 
           show.dens=TRUE, show.hist=FALSE, grid.size=1000)
MoE_gpairs(res, cov.ind=0, expert.covar=FALSE, 
           show.dens=TRUE, show.hist=FALSE, grid.size=1000)
           
# Produce a generalised pairs plot for a model with a noise component.
# Reorder the covariates and omit the variables "Hc" and "Hg".
# Use barcode plots for the categorical/continuous pairs.
# Magnify the size of scatter points assigned to the noise component.

resN  <- MoE_clust(ais[,3:7], G=2, gating= ~ SSF + Ht, expert= ~ sex,
                   network.data=ais, modelNames="EEE", tau0=0.1, noise.gate=FALSE)
                   
MoE_gpairs(resN, data.ind=c(1,2,5), cov.ind=c(3,1,2), 
           conditional="barcode", noise.size=grid::unit(0.5, "char"))

MoEClust documentation built on May 29, 2024, 6:44 a.m.