#' Model series plot
#'
#' @description
#' Produce a graphical output to examine the effect of using different model specifications (design)
#' on the predictive performance of these models (a model series). Devised to access the results of
#' [pedometrics::buildModelSeries()] and [pedometrics::statsMS()], but can be easily adapted to
#' work with any model structure and performance measure.
#'
#' @param obj Object of class `data.frame`, generally returned by [pedometrics::statsMS()],
#' containing:
#'
#' 1. a series of performance statistics of several models, and
#' 2. the design information of each model.
#'
#' See \sQuote{Details} for more information.
#'
#' @param grid Vector of integer values or character strings indicating the columns of the
#' `data.frame` containing the design data which will be gridded using the function
#' [lattice::levelplot()]. See \sQuote{Details} for more information.
#'
#' @param line Character string or integer value indicating which of the performance statistics
#' (usually calculated by [pedometrics::statsMS()]) should be plotted using the function
#' [lattice::xyplot()]. See \sQuote{Details} for more information.
#'
#' @param ind Integer value indicating for which group of models the mean rank is to be calculated.
#' See \sQuote{Details} for more information.
#'
#' @param type Vector of character strings indicating some of the effects to be used when plotting
#' the performance statistics using [lattice::xyplot()]. Defaults to `type = c("b", "g")`. See
#' [lattice::panel.xyplot()] for more information on how to set this argument.
#'
#' @param pch Vector with two integer values specifying the symbols to be used to plot points. The
#' first sets the symbol used to plot the performance statistic, while the second sets the symbol
#' used to plot the mean rank of the indicator set using argument `ind`. Defaults to
#' `pch = c(20, 2)`. See [graphics::points()] for possible values and their interpretation.
#'
#' @param size Numeric value specifying the size of the symbols used for plotting the mean rank of
#' the indicator set using argument `ind`. Defaults to `size = 0.5`. See [grid::grid.points()] for
#' more information.
#'
#' @param arrange Character string indicating how the model series should be arranged, which can be
#' in ascending (`"asc"`) or descending (`"desc"`, default) order.
# See [plyr::arrange()] for more information.
#'
#' @param color Vector defining the colors to be used in the grid produced by function
#' [lattice::levelplot()]. If `color = NULL`, defaults to `color = cm.colors(n)`, where `n` is the
#' number of unique values in the columns defined by argument `grid`. See [grDevices::cm.colors()]
#' to see how to use other color palettes.
#'
#' @param xlim Numeric vector of length 2, giving the x coordinates range. If `xlim = NULL` (which
#' is the recommended value), defaults to `xlim = c(0.5, dim(obj)[1] + 0.5)`. This is, so far, the
#' optimum range for adequate plotting.
#'
#' @param ylab Character vector of length 2, giving the y-axis labels. When `obj` is a `data.frame`
#' returned by [pedometrics::statsMS()], and the performance statistic passed to argument
#' `line` is one of those calculated by [pedometrics::statsMS()] (`"candidates"`, `"df"`, `"aic"`,
#' `"rmse"`, `"nrmse"`, `"r2"`, `"adj_r2"`, or `"ADJ_r2"`), the function tries to automatically
#' identify the correct `ylab`.
#'
#' @param xlab Character vector of unit length, the x-axis label. Defaults `xlab = "Model ranking"`.
#'
#' @param at Numeric vector indicating the location of tick marks along the x axis (in native
#' coordinates).
#'
#' @param ... Other arguments for plotting, although most of these have no been tested. Argument
#' `asp`, for example, is not effective since the function automatically identifies the best aspect
#' for plotting based on the dimensions of the design data.
#'
#' @details
#' This section gives more details about arguments `obj`, `grid`, `line`, `arrange`, and `ind`.
#'
#' \subsection{obj}{
#' The argument `obj` usually constitutes a `data.frame` returned by [pedometrics::statsMS()].
#' However, the user can use any `data.frame` object as far as it contains the two basic units of
#' information needed:
#' \enumerate{
#' \item design data passed with argument `grid`
#' \item performance statistic passed with argument `line`
#' }
#' }
#' \subsection{grid}{
#' The argument `grid` indicates the _design_ data which is used to produce the grid output in the
#' top of the model series plot. By _design_ we mean the data that specify the structure of each
#' model and how they differ from each other. Suppose that eight linear models were fit using three
#' types of predictor variables (`a`, `b`, and `c`). Each of these predictor variables is available
#' in two versions that differ by their accuracy, where `0` means a less accurate predictor
#' variable, while `1` means a more accurate predictor variable. This yields 2^3 = 8 total possible
#' combinations. The _design_ data would be of the following form:
#'
#' \verb{
#' > design
#' a b c
#' 1 0 0 0
#' 2 0 0 1
#' 3 0 1 0
#' 4 1 0 0
#' 5 0 1 1
#' 6 1 0 1
#' 7 1 1 0
#' 8 1 1 1
#' }
#' }
#' \subsection{line}{
#' The argument `line` corresponds to the performance statistic that is used to arrange the models
#' in ascending or descending order, and to produce the line output in the bottom of the model
#' series plot. For example, it can be a series of values of adjusted coefficient of determination,
#' one for each model:
#'
#' \verb{
#' adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
#' }
#' }
#' \subsection{arrange}{
#' The argument `arrange` automatically arranges the model series according to the performance
#' statistics selected with argument `line`. If `obj` is a `data.frame` returned by
#' [pedometrics::statsMS()], then the function uses standard arranging approaches. For most
#' performance statistics, the models are arranged in descending order. The exception is when
#' `"r2"`, `"adj_r2"`, or `"ADJ_r2"` are used, in which case the models are arranged in ascending
#' order. This means that the model with lowest value appears in the leftmost side of the model
#' series plot, while the models with the highest value appears in the rightmost side of the plot.
#'
#' \verb{
#' > arrange(obj, adj_r2)
#' id a b c adj_r2
#' 1 5 1 0 1 0.54
#' 2 2 0 0 1 0.74
#' 3 3 1 0 0 0.81
#' 4 4 0 1 0 0.85
#' 5 6 0 1 1 0.86
#' 6 1 0 0 0 0.87
#' 7 8 1 1 1 0.89
#' 8 7 1 1 0 0.90
#' }
#'
#' This results suggest that the best performing model is that of `id = 7`, while the model of
#' `id = 5` is the poorest one.
#' }
#' \subsection{ind}{
#' The model series plot allows to see how the design influences model performance. This is achieved
#' mainly through the use of different colors in the grid output, where each unique value in the
#' _design_ data is represented by a different color. For the example given above, one could try to
#' see if the models built with the more accurate versions of the predictor variables have a better
#' performance by identifying their relative distribution in the model series plot. The models
#' placed at the rightmost side of the plot are those with the best performance.
#'
#' The argument `ind` provides another tool to help identifying how the design, more specifically
#' how each variable in the _design_ data, influences model performance. This is done by simply
#' calculating the mean ranking of the models that were built using the updated version of each
#' predictor variable. This very same mean ranking is also used to rank the predictor variables and
#' thus identify which of them is the most important.
#'
#' After arranging the `design` data described above using the adjusted coefficient of
#' determination, the following mean rank is obtained for each predictor variable:
#'
#' \verb{
#' > rank_center
#' a b c
#' 1 5.75 6.25 5.25
#' }
#'
#' This result suggests that the best model performance is obtained when using the updated version
#' of the predictor variable `b`. In the model series plot, the predictor variable `b` appears in
#' the top row, while the predictor variable `c` appears in the bottom row.
#' }
#' @return
#' An object of class `"trellis"` consisting of a model series plot.
#'
#' @references
#' Deepayan Sarkar (2008). _Lattice: Multivariate Data Visualization with R._ Springer, New York.
#' ISBN 978-0-387-75968-5.
#'
#' Roger D. Peng (2008). _A method for visualizing multivariate time series data._ Journal of
#' Statistical Software. v. 25 (Code Snippet), p. 1-17.
#'
#' Roger D. Peng (2012). _mvtsplot: Multivariate Time Series Plot._ R package version 1.0-1.
#' <https://CRAN.R-project.org/package=mvtsplot>.
#'
#' A. Samuel-Rosa, G. B. M. Heuvelink, G. de Mattos Vasques, and L. H. C. dos Anjos, Do more
#' detailed environmental covariates deliver more accurate soil maps?, _Geoderma_, vol. 243–244,
#' pp. 214–227, May 2015, doi: 10.1016/j.geoderma.2014.12.017.
#'
#' @author Alessandro Samuel-Rosa \email{alessandrosamuelrosa@@gmail.com}
#'
#' @section Dependencies:
# The __plyr__ package, provider of tools for splitting, applying and combining data in R, is
# required for [pedometrics::plotModelSeries()] to work. The development version of the __plyr__
# package is available on <https://github.com/hadley/plyr> while its old versions are available on
# the CRAN archive at <https://cran.r-project.org/src/contrib/Archive/plyr/>.
#'
#' The __grDevices__ package, provider of graphics devices and support for colours and fonts in R,
#' is required for [pedometrics::plotModelSeries()] to work.
#'
#' The __grid__ package, a rewrite of the graphics layout capabilities in R, is required for
#' [pedometrics::plotModelSeries()] to work.
#'
#' @note
#' Some of the solutions used to build this function were found in the source code of the R-package
#' __mvtsplot__. As such, the author of that package, Roger D. Peng \email{rpeng@@jhsph.edu}, is
#' entitled \sQuote{contributors} to the R-package __pedometrics__.
#'
#' @section Warning:
#' Use the original functions [lattice::xyplot()] and [lattice::levelplot()] for higher
#' customization.
#'
#' @seealso [lattice::xyplot()] [lattice::levelplot()]
#'
#' @examples
# if (all(require(plyr), require(grDevices), require(grid))) {
#' if (all(require(grDevices), require(grid))) {
#' # This example follows the discussion in section "Details"
#' # Note that the data.frame is created manually
#' id <- c(1:8)
#' design <- data.frame(a = c(0, 0, 1, 0, 1, 0, 1, 1),
#' b = c(0, 0, 0, 1, 0, 1, 1, 1),
#' c = c(0, 1, 0, 0, 1, 1, 0, 1))
#' adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
#' obj <- cbind(id, design, adj_r2)
#' p <- plotModelSeries(obj, grid = c(2:4), line = "adj_r2", ind = 1,
#' color = c("lightyellow", "palegreen"),
#' main = "Model Series Plot")
#' }
#' @keywords hplot
#' @importFrom stats update
# FUNCTION #########################################################################################
#' @export
#' @rdname plotModelSeries
plotModelSeries <-
function(obj, grid, line, ind, type = c("b", "g"), pch = c(20, 2), size = 0.5, arrange = "desc",
color = NULL, xlim = NULL, ylab = NULL, xlab = NULL, at = NULL, ...) {
# check if suggested packages are installed
if (!requireNamespace("grDevices")) stop("grDevices package is missing")
# if (!requireNamespace("lattice")) stop("lattice package is missing")
if (!requireNamespace("grid")) stop("grid package is missing")
# if (!requireNamespace("plyr")) stop("plyr package is missing")
# check function arguments
if (missing(obj)) {
stop("'obj' is a mandatory argument")
}
if (missing(grid)) {
stop("'grid' is a mandatory argument")
}
if (missing(line)) {
stop("'line' is a mandatory argument")
}
if (missing(ind)) {
stop("'ind' is a mandatory argument")
}
if (!inherits(obj, "data.frame")) {
stop("'obj' should be of class data.frame")
}
if (!inherits(grid, c("integer", "character", "numeric"))) {
stop("'grid' should be an integer value or a character string")
}
if (!inherits(line, c("integer", "character", "numeric"))) {
stop("'line' should be an integer value or a character string")
}
if (!inherits(ind, c("integer", "numeric")) || round(ind) != ind) {
stop("'ind' should be an integer value")
}
if (inherits(line, c("integer", "numeric"))) {
nam0 <- c("candidates", "df", "aic", "rmse", "nrmse", "r2", "adj_r2", "ADJ_r2")
nam1 <- colnames(obj)[line]
if (!any(colnames(obj)[line] == nam0)) {
stop(paste0("'ylab' should be provided for performance statistics '", nam1, "'"))
}
}
if (!missing(xlab)) {
if (length(xlab) != 1) {
stop("'xlab' should have length equal to 1")
}
}
if (!missing(ylab)) {
if (length(ylab) != 2) {
stop("'ylab' should have length equal to 2")
}
}
if (length(type) != 2) {
stop("'type' should have length equal to 2")
}
if (length(pch) != 2) {
stop("'pch' should have length equal to 2")
}
# prepare data
if (inherits(line, "numeric")) {
line <- colnames(obj)[line]
}
if (any(line == c("r2", "adj_r2", "ADJ_r2"))) {
# obj <- plyr::arrange(obj, plyr::desc(obj[, line]))
idx_arrange <- order(obj[[line]], decreasing = TRUE)
obj <- obj[idx_arrange, ]
} else {
# obj <- plyr::arrange(obj, obj[, line])
idx_arrange <- order(obj[[line]], decreasing = FALSE)
obj <- obj[idx_arrange, ]
}
grid <- as.matrix(obj[, grid])
x <- seq(1, dim(obj)[1], 1)
y <- as.numeric(obj[, line])
if (missing(at)) {
if (max(x) < 100) {
m <- round(max(x) / 10) * 10
at <- c(1, seq(5, m, 5))
} else {
m <- round(max(x) / 10) * 10
at <- c(1, seq(10, m, by = 10))
}
}
if (missing(color)) {
color <- grDevices::cm.colors(length(unique(as.numeric(grid))))
}
if (missing(xlim)) {
xlim <- c(0.5, dim(obj)[1] + 0.5)
}
if (missing(xlab)) {
xlab <- "Model ranking"
}
if (missing(ylab)) {
if (inherits(line, "numeric")) {
line <- colnames(obj)[line]
}
if (line == "candidates") {
yl <- "Candidate predictors"
}
if (line == "df") {
yl <- "Degrees of freedom"
}
if (line == "aic") {
yl <- "AIC"
}
if (line == "rmse") {
yl <- "RMSE"
}
if (line == "nrmse") {
yl <- "NRMSE"
}
if (line == "r2") {
yl <- expression(paste0(R^2))
}
if (any(line == c("adj_r2", "ADJ_r2"))) {
yl <- expression(paste0("Adjusted ", R^2))
}
ylab <- list(c(yl, "Design"))
}
rank_center <- rep(NA, dim(grid)[2])
for (i in seq_along(rank_center)) {
rank_center[i] <- mean(cbind(x, grid)[, 1][which(cbind(x, grid)[, i + 1] == ind)])
}
grid <- grid[, order(rank_center, decreasing = TRUE)]
p1 <- lattice::xyplot(
y ~ x, xlim = rev(grDevices::extendrange(xlim, f = 0)), type = type, pch = pch[1],
scales = list(y = list(rot = 0), x = list(at = at)))
p2 <- lattice::levelplot(
grid, colorkey = FALSE, xlim = rev(grDevices::extendrange(xlim, f = 0)),
col.regions = color, scales = list(y = list(rot = 90)),
panel = function (...) {
lattice::panel.levelplot(...)
grid::grid.points(x = sort(rank_center, decreasing = TRUE),
seq(1, dim(grid)[2], 1), pch = pch[2], size = grid::unit(size, "char"))
})
# Print plot
update(c(p1, p2), layout = c(1, 2), xlab = xlab,
ylab = ylab, aspect = c((dim(grid)[2] * 2) / dim(grid)[1]),
par.settings = list(layout.heights = list(panel = c(0.5, 0.5))), ...)
}
#' @export
#' @rdname plotModelSeries
plotMS <- plotModelSeries
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.