Nothing
#' Model series plot
#'
#' @description
#' Produce a graphical output to examine the effect of using different model specifications (design)
#' on the predictive performance of these models (a model series). Devised to access the results of
#' [pedometrics::buildModelSeries()] and [pedometrics::statsMS()], but can be easily adapted to
#' work with any model structure and performance measure.
#'
#' @param obj Object of class `data.frame`, generally returned by [pedometrics::statsMS()],
#' containing:
#'
#' 1. a series of performance statistics of several models, and
#' 2. the design information of each model.
#'
#' See \sQuote{Details} for more information.
#'
#' @param grid Vector of integer values or character strings indicating the columns of the
#' `data.frame` containing the design data which will be gridded using the function
#' [lattice::levelplot()]. See \sQuote{Details} for more information.
#'
#' @param line Character string or integer value indicating which of the performance statistics
#' (usually calculated by [pedometrics::statsMS()]) should be plotted using the function
#' [lattice::xyplot()]. See \sQuote{Details} for more information.
#'
#' @param ind Integer value indicating for which group of models the mean rank is to be calculated.
#' See \sQuote{Details} for more information.
#'
#' @param type Vector of character strings indicating some of the effects to be used when plotting
#' the performance statistics using [lattice::xyplot()]. Defaults to `type = c("b", "g")`. See
#' [lattice::panel.xyplot()] for more information on how to set this argument.
#'
#' @param pch Vector with two integer values specifying the symbols to be used to plot points. The
#' first sets the symbol used to plot the performance statistic, while the second sets the symbol
#' used to plot the mean rank of the indicator set using argument `ind`. Defaults to
#' `pch = c(20, 2)`. See [graphics::points()] for possible values and their interpretation.
#'
#' @param size Numeric value specifying the size of the symbols used for plotting the mean rank of
#' the indicator set using argument `ind`. Defaults to `size = 0.5`. See [grid::grid.points()] for
#' more information.
#'
#' @param arrange Character string indicating how the model series should be arranged, which can be
#' in ascending (`"asc"`) or descending (`"desc"`, default) order.
# See [plyr::arrange()] for more information.
#'
#' @param color Vector defining the colors to be used in the grid produced by function
#' [lattice::levelplot()]. If `color = NULL`, defaults to `color = cm.colors(n)`, where `n` is the
#' number of unique values in the columns defined by argument `grid`. See [grDevices::cm.colors()]
#' to see how to use other color palettes.
#'
#' @param xlim Numeric vector of length 2, giving the x coordinates range. If `xlim = NULL` (which
#' is the recommended value), defaults to `xlim = c(0.5, dim(obj)[1] + 0.5)`. This is, so far, the
#' optimum range for adequate plotting.
#'
#' @param ylab Character vector of length 2, giving the y-axis labels. When `obj` is a `data.frame`
#' returned by [pedometrics::statsMS()], and the performance statistic passed to argument
#' `line` is one of those calculated by [pedometrics::statsMS()] (`"candidates"`, `"df"`, `"aic"`,
#' `"rmse"`, `"nrmse"`, `"r2"`, `"adj_r2"`, or `"ADJ_r2"`), the function tries to automatically
#' identify the correct `ylab`.
#'
#' @param xlab Character vector of unit length, the x-axis label. Defaults `xlab = "Model ranking"`.
#'
#' @param at Numeric vector indicating the location of tick marks along the x axis (in native
#' coordinates).
#'
#' @param ... Other arguments for plotting, although most of these have no been tested. Argument
#' `asp`, for example, is not effective since the function automatically identifies the best aspect
#' for plotting based on the dimensions of the design data.
#'
#' @details
#' This section gives more details about arguments `obj`, `grid`, `line`, `arrange`, and `ind`.
#'
#' \subsection{obj}{
#' The argument `obj` usually constitutes a `data.frame` returned by [pedometrics::statsMS()].
#' However, the user can use any `data.frame` object as far as it contains the two basic units of
#' information needed:
#' \enumerate{
#' \item design data passed with argument `grid`
#' \item performance statistic passed with argument `line`
#' }
#' }
#' \subsection{grid}{
#' The argument `grid` indicates the _design_ data which is used to produce the grid output in the
#' top of the model series plot. By _design_ we mean the data that specify the structure of each
#' model and how they differ from each other. Suppose that eight linear models were fit using three
#' types of predictor variables (`a`, `b`, and `c`). Each of these predictor variables is available
#' in two versions that differ by their accuracy, where `0` means a less accurate predictor
#' variable, while `1` means a more accurate predictor variable. This yields 2^3 = 8 total possible
#' combinations. The _design_ data would be of the following form:
#'
#' \verb{
#' > design
#' a b c
#' 1 0 0 0
#' 2 0 0 1
#' 3 0 1 0
#' 4 1 0 0
#' 5 0 1 1
#' 6 1 0 1
#' 7 1 1 0
#' 8 1 1 1
#' }
#' }
#' \subsection{line}{
#' The argument `line` corresponds to the performance statistic that is used to arrange the models
#' in ascending or descending order, and to produce the line output in the bottom of the model
#' series plot. For example, it can be a series of values of adjusted coefficient of determination,
#' one for each model:
#'
#' \verb{
#' adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
#' }
#' }
#' \subsection{arrange}{
#' The argument `arrange` automatically arranges the model series according to the performance
#' statistics selected with argument `line`. If `obj` is a `data.frame` returned by
#' [pedometrics::statsMS()], then the function uses standard arranging approaches. For most
#' performance statistics, the models are arranged in descending order. The exception is when
#' `"r2"`, `"adj_r2"`, or `"ADJ_r2"` are used, in which case the models are arranged in ascending
#' order. This means that the model with lowest value appears in the leftmost side of the model
#' series plot, while the models with the highest value appears in the rightmost side of the plot.
#'
#' \verb{
#' > arrange(obj, adj_r2)
#' id a b c adj_r2
#' 1 5 1 0 1 0.54
#' 2 2 0 0 1 0.74
#' 3 3 1 0 0 0.81
#' 4 4 0 1 0 0.85
#' 5 6 0 1 1 0.86
#' 6 1 0 0 0 0.87
#' 7 8 1 1 1 0.89
#' 8 7 1 1 0 0.90
#' }
#'
#' This results suggest that the best performing model is that of `id = 7`, while the model of
#' `id = 5` is the poorest one.
#' }
#' \subsection{ind}{
#' The model series plot allows to see how the design influences model performance. This is achieved
#' mainly through the use of different colors in the grid output, where each unique value in the
#' _design_ data is represented by a different color. For the example given above, one could try to
#' see if the models built with the more accurate versions of the predictor variables have a better
#' performance by identifying their relative distribution in the model series plot. The models
#' placed at the rightmost side of the plot are those with the best performance.
#'
#' The argument `ind` provides another tool to help identifying how the design, more specifically
#' how each variable in the _design_ data, influences model performance. This is done by simply
#' calculating the mean ranking of the models that were built using the updated version of each
#' predictor variable. This very same mean ranking is also used to rank the predictor variables and
#' thus identify which of them is the most important.
#'
#' After arranging the `design` data described above using the adjusted coefficient of
#' determination, the following mean rank is obtained for each predictor variable:
#'
#' \verb{
#' > rank_center
#' a b c
#' 1 5.75 6.25 5.25
#' }
#'
#' This result suggests that the best model performance is obtained when using the updated version
#' of the predictor variable `b`. In the model series plot, the predictor variable `b` appears in
#' the top row, while the predictor variable `c` appears in the bottom row.
#' }
#' @return
#' An object of class `"trellis"` consisting of a model series plot.
#'
#' @references
#' Deepayan Sarkar (2008). _Lattice: Multivariate Data Visualization with R._ Springer, New York.
#' ISBN 978-0-387-75968-5.
#'
#' Roger D. Peng (2008). _A method for visualizing multivariate time series data._ Journal of
#' Statistical Software. v. 25 (Code Snippet), p. 1-17.
#'
#' Roger D. Peng (2012). _mvtsplot: Multivariate Time Series Plot._ R package version 1.0-1.
#' <https://CRAN.R-project.org/package=mvtsplot>.
#'
#' A. Samuel-Rosa, G. B. M. Heuvelink, G. de Mattos Vasques, and L. H. C. dos Anjos, Do more
#' detailed environmental covariates deliver more accurate soil maps?, _Geoderma_, vol. 243–244,
#' pp. 214–227, May 2015, doi: 10.1016/j.geoderma.2014.12.017.
#'
#' @author Alessandro Samuel-Rosa \email{alessandrosamuelrosa@@gmail.com}
#'
#' @section Dependencies:
# The __plyr__ package, provider of tools for splitting, applying and combining data in R, is
# required for [pedometrics::plotModelSeries()] to work. The development version of the __plyr__
# package is available on <https://github.com/hadley/plyr> while its old versions are available on
# the CRAN archive at <https://cran.r-project.org/src/contrib/Archive/plyr/>.
#'
#' The __grDevices__ package, provider of graphics devices and support for colours and fonts in R,
#' is required for [pedometrics::plotModelSeries()] to work.
#'
#' The __grid__ package, a rewrite of the graphics layout capabilities in R, is required for
#' [pedometrics::plotModelSeries()] to work.
#'
#' @note
#' Some of the solutions used to build this function were found in the source code of the R-package
#' __mvtsplot__. As such, the author of that package, Roger D. Peng \email{rpeng@@jhsph.edu}, is
#' entitled \sQuote{contributors} to the R-package __pedometrics__.
#'
#' @section Warning:
#' Use the original functions [lattice::xyplot()] and [lattice::levelplot()] for higher
#' customization.
#'
#' @seealso [lattice::xyplot()] [lattice::levelplot()]
#'
#' @examples
# if (all(require(plyr), require(grDevices), require(grid))) {
#' if (all(require(grDevices), require(grid))) {
#' # This example follows the discussion in section "Details"
#' # Note that the data.frame is created manually
#' id <- c(1:8)
#' design <- data.frame(a = c(0, 0, 1, 0, 1, 0, 1, 1),
#' b = c(0, 0, 0, 1, 0, 1, 1, 1),
#' c = c(0, 1, 0, 0, 1, 1, 0, 1))
#' adj_r2 <- c(0.87, 0.74, 0.81, 0.85, 0.54, 0.86, 0.90, 0.89)
#' obj <- cbind(id, design, adj_r2)
#' p <- plotModelSeries(obj, grid = c(2:4), line = "adj_r2", ind = 1,
#' color = c("lightyellow", "palegreen"),
#' main = "Model Series Plot")
#' }
#' @keywords hplot
#' @importFrom stats update
# FUNCTION #########################################################################################
#' @export
#' @rdname plotModelSeries
plotModelSeries <-
function(obj, grid, line, ind, type = c("b", "g"), pch = c(20, 2), size = 0.5, arrange = "desc",
color = NULL, xlim = NULL, ylab = NULL, xlab = NULL, at = NULL, ...) {
# check if suggested packages are installed
if (!requireNamespace("grDevices")) stop("grDevices package is missing")
# if (!requireNamespace("lattice")) stop("lattice package is missing")
if (!requireNamespace("grid")) stop("grid package is missing")
# if (!requireNamespace("plyr")) stop("plyr package is missing")
# check function arguments
if (missing(obj)) {
stop("'obj' is a mandatory argument")
}
if (missing(grid)) {
stop("'grid' is a mandatory argument")
}
if (missing(line)) {
stop("'line' is a mandatory argument")
}
if (missing(ind)) {
stop("'ind' is a mandatory argument")
}
if (!inherits(obj, "data.frame")) {
stop("'obj' should be of class data.frame")
}
if (!inherits(grid, c("integer", "character", "numeric"))) {
stop("'grid' should be an integer value or a character string")
}
if (!inherits(line, c("integer", "character", "numeric"))) {
stop("'line' should be an integer value or a character string")
}
if (!inherits(ind, c("integer", "numeric")) || round(ind) != ind) {
stop("'ind' should be an integer value")
}
if (inherits(line, c("integer", "numeric"))) {
nam0 <- c("candidates", "df", "aic", "rmse", "nrmse", "r2", "adj_r2", "ADJ_r2")
nam1 <- colnames(obj)[line]
if (!any(colnames(obj)[line] == nam0)) {
stop(paste0("'ylab' should be provided for performance statistics '", nam1, "'"))
}
}
if (!missing(xlab)) {
if (length(xlab) != 1) {
stop("'xlab' should have length equal to 1")
}
}
if (!missing(ylab)) {
if (length(ylab) != 2) {
stop("'ylab' should have length equal to 2")
}
}
if (length(type) != 2) {
stop("'type' should have length equal to 2")
}
if (length(pch) != 2) {
stop("'pch' should have length equal to 2")
}
# prepare data
if (inherits(line, "numeric")) {
line <- colnames(obj)[line]
}
if (any(line == c("r2", "adj_r2", "ADJ_r2"))) {
# obj <- plyr::arrange(obj, plyr::desc(obj[, line]))
idx_arrange <- order(obj[[line]], decreasing = TRUE)
obj <- obj[idx_arrange, ]
} else {
# obj <- plyr::arrange(obj, obj[, line])
idx_arrange <- order(obj[[line]], decreasing = FALSE)
obj <- obj[idx_arrange, ]
}
grid <- as.matrix(obj[, grid])
x <- seq(1, dim(obj)[1], 1)
y <- as.numeric(obj[, line])
if (missing(at)) {
if (max(x) < 100) {
m <- round(max(x) / 10) * 10
at <- c(1, seq(5, m, 5))
} else {
m <- round(max(x) / 10) * 10
at <- c(1, seq(10, m, by = 10))
}
}
if (missing(color)) {
color <- grDevices::cm.colors(length(unique(as.numeric(grid))))
}
if (missing(xlim)) {
xlim <- c(0.5, dim(obj)[1] + 0.5)
}
if (missing(xlab)) {
xlab <- "Model ranking"
}
if (missing(ylab)) {
if (inherits(line, "numeric")) {
line <- colnames(obj)[line]
}
if (line == "candidates") {
yl <- "Candidate predictors"
}
if (line == "df") {
yl <- "Degrees of freedom"
}
if (line == "aic") {
yl <- "AIC"
}
if (line == "rmse") {
yl <- "RMSE"
}
if (line == "nrmse") {
yl <- "NRMSE"
}
if (line == "r2") {
yl <- expression(paste0(R^2))
}
if (any(line == c("adj_r2", "ADJ_r2"))) {
yl <- expression(paste0("Adjusted ", R^2))
}
ylab <- list(c(yl, "Design"))
}
rank_center <- rep(NA, dim(grid)[2])
for (i in seq_along(rank_center)) {
rank_center[i] <- mean(cbind(x, grid)[, 1][which(cbind(x, grid)[, i + 1] == ind)])
}
grid <- grid[, order(rank_center, decreasing = TRUE)]
p1 <- lattice::xyplot(
y ~ x, xlim = rev(grDevices::extendrange(xlim, f = 0)), type = type, pch = pch[1],
scales = list(y = list(rot = 0), x = list(at = at)))
p2 <- lattice::levelplot(
grid, colorkey = FALSE, xlim = rev(grDevices::extendrange(xlim, f = 0)),
col.regions = color, scales = list(y = list(rot = 90)),
panel = function (...) {
lattice::panel.levelplot(...)
grid::grid.points(x = sort(rank_center, decreasing = TRUE),
seq(1, dim(grid)[2], 1), pch = pch[2], size = grid::unit(size, "char"))
})
# Print plot
update(c(p1, p2), layout = c(1, 2), xlab = xlab,
ylab = ylab, aspect = c((dim(grid)[2] * 2) / dim(grid)[1]),
par.settings = list(layout.heights = list(panel = c(0.5, 0.5))), ...)
}
#' @export
#' @rdname plotModelSeries
plotMS <- plotModelSeries
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.