R/plotMcl.R
In diversityForest: Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

Documented in plotMcl

# -------------------------------------------------------------------------------
#   This file is part of 'diversityForest'.
#
# 'diversityForest' is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# 'diversityForest' is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with 'diversityForest'. If not, see <http://www.gnu.org/licenses/>.
#
#  NOTE: 'diversityForest' is a fork of the popular R package 'ranger', written by Marvin N. Wright.
#  Most R and C++ code is identical with that of 'ranger'. The package 'diversityForest'
#  was written by taking the original 'ranger' code and making any
#  changes necessary to implement diversity forests.
#
# -------------------------------------------------------------------------------

##' This function allows to visualise the (estimated) distributions of one or several variables for each of the classes of the outcomes.
##' This allows to study how exactly variables of interest are associated with the outcome, which is crucial for interpretive purposes.
##' Two types of visualisations are available: density plots and boxplots. See the 'Details' section below for further explanation.
##'
##' For the \code{"density"} plots, kernel density estimates (obtained using the 
##' \code{density()} function from base R) of the within-class distributions are 
##' plotted in the same plot using different colors and, depending on the number 
##' of classes, different line types. To account for the different number of
##' observations per class, each density is multiplied by the proportion of 
##' observations from that class. The resulting scaled densities can be interpreted 
##' in terms of the local density of the observations from each class relative to 
##' those from the other classes. For example, if a scaled density has the largest 
##' value in a particular region, this can be interpreted as the respective class 
##' being the most frequent in that region. Another example: If the scaled density 
##' of class "A" is twice as large as the scaled density of class "B" in a particular 
##' region, this can be interpreted to mean that there are twice as many observations 
##' of class "A" as of class "B" in that region.
##' 
##' In the \code{"density"} plots, only classes represented by at least two 
##' observations are considered. If the number of classes is greater than 7, 
##' the different classes are distinguished using both colors and line styles. 
##' To indicate the absolute numbers of observations in the different regions, 
##' the locations of the observations from the different classes are visualized 
##' using a rug plot on the x-axis, using the same colors and line types as for 
##' the density plots. If the number of observations is greater than 1,000, a 
##' random subset of 1,000 observations is shown in the rug plot instead of all 
##' observations for visual clarity.
##' 
##' The \code{"boxplot"} plots show the (estimated) within-class distributions 
##' side by side using boxplots. All classes are considered, even those represented 
##' by only a single observation. For the \code{plot_type="both"} option, which 
##' displays both \code{"density"} and \code{"boxplot"} plots, the boxplots are 
##' displayed using the same colors and (if applicable) line styles as the kernel 
##' density estimates, for clarity. Boxplots of classes for which no kernel density 
##' estimates were obtained (i.e., those of the classes represented by single 
##' observations) are shown in grey. 
##' 
##' Note that plots are only generated for those variables in \code{varnames} 
##' that have at least as many unique values as there are outcome classes. For 
##' categorical variables, the category labels are printed on the x- or y-axis 
##' of the \code{"density"} or \code{"boxplot"} plots, respectively. The rug plots 
##' of the \code{"density"} plots are produced only for numeric variables.
##' 
##' @title Plots of the (estimated) within-class distributions of variables
##' @param data Data frame containing the variables.
##' @param yvarname Name of outcome variable.
##' @param varnames Names of the variables for which plots should be created.
##' @param plot_type Plot type, one of the following: "both" (the default), "density", "boxplot".  If "density", \code{"density"} plot are produced, if "boxplot", \code{"boxplot"} plots are produced, and if "both", both \code{"density"} plots and \code{"boxplot"} plots are produced. See the 'Details' section below for details.
##' @param addtitles Set to \code{TRUE} (default) to add headings providing the names of the respective variables to the plots.
##' @param plotit This states whether the plots are actually plotted or merely returned as \code{ggplot} objects. Default is \code{TRUE}.
##' @return A list returned invisibly. The list has length equal to the number of elements in \code{varnames}. 
##' Each element corresponds to one variable and contains a list of \code{ggplot2} plots structured as in the output of \code{\link{plotVar}}.
##' @examples
##' \dontrun{
##'
##' ## Load package:
##' 
##' library("diversityForest")
##' 
##' 
##' 
##' ## Plot "density" and "boxplot" plots (default: plot_type = "both") for the 
##' ## first three variables in the "hars" dataset:
##' 
##' data(hars)
##' plotMcl(data = hars, yvarname = "Activity", varnames = c("tBodyAcc.mean...X", 
##'                                                          "tBodyAcc.mean...Y", 
##'                                                          "tBodyAcc.mean...Z"))
##' 
##' 
##' ## Plot only the "density" plots for these variables:
##' 
##' plotMcl(data = hars, yvarname = "Activity", 
##'         varnames = c("tBodyAcc.mean...X", "tBodyAcc.mean...Y", 
##'                      "tBodyAcc.mean...Z"), plot_type = "density")
##' 
##' ## Plot the "density" plots for these variables, but without titles of the
##' ## plots:
##' 
##' plotMcl(data = hars, yvarname = "Activity", varnames = 
##'           c("tBodyAcc.mean...X", "tBodyAcc.mean...Y", "tBodyAcc.mean...Z"), 
##'         plot_type = "density", addtitles = FALSE)
##' 
##' 
##' ## Make density plots for these variables, but only save them in a list "ps"
##' ## without plotting them ("plotit = FALSE"):
##' 
##' ps <- plotMcl(data = hars, yvarname = "Activity", varnames = 
##'                 c("tBodyAcc.mean...X", "tBodyAcc.mean...Y", 
##'                   "tBodyAcc.mean...Z"), plot_type = "density", 
##'               addtitles = FALSE, plotit = FALSE)
##' 
##' 
##' ## The plots can be manipulated later by using ggplot2 functionalities:
##' 
##' library("ggplot2")
##' p1 <- ps[[1]]$dens_pl + ggtitle("First variable in the dataset") + 
##'   labs(x="Variable values", y="my scaled density")
##' 
##' p2 <- ps[[3]]$dens_pl + ggtitle("Third variable in the dataset") + 
##'   labs(x="Variable values", y="my scaled density")
##' 
##' 
##' ## Combine both of the above plots:
##' 
##' library("ggpubr")
##' p <- ggarrange(p1, p2, ncol = 2)
##' p
##' 
##' ## # Save as PDF:
##' ## ggsave(file="mypathtofolder/FigureXY1.pdf", width=14, height=6)
##' 
##' }
##'
##' @author Roman Hornung
##' @references
##' \itemize{
##'   \item Hornung, R. (2022). Diversity forests: Using split sampling to enable innovative complex split procedures in random forests. SN Computer Science 3(2):1, <\doi{10.1007/s42979-021-00920-1}>.
##'   }
##' @seealso \code{\link{plot.multifor}}, \code{\link{plotVar}}
##' @encoding UTF-8
##' @import stats 
##' @import utils
##' @export
plotMcl <- function(data, yvarname, varnames, plot_type=c("both", "density", "boxplot")[1], addtitles = TRUE, plotit = TRUE) {
  
  if (!all(varnames %in% names(data)))
    stop("Not all entries of 'varnames' are found in 'data'.")
  
  datacov <- data[,varnames, drop=FALSE]
  y_outcome <- data[,yvarname]
  
  # Plots are created only for covariates that have at least as many unique values 
  # as there are outcome classes:
  suit_inds <- which(apply(datacov, 2, function(x) length(unique(x)) >= length(unique(y_outcome))))
  
  if (length(suit_inds)==0)
    stop("None of the variables in 'varnames' have at least as many unique values as there are outcome classes. --> Nothing to plot.")
  
  if (length(suit_inds) < ncol(datacov)) {
    varnames_notused <- paste0("\"", varnames[-suit_inds], "\"")
    warning(paste0("Not all variables in 'varnames' have at least as many unique values as there are outcome classes.\n--> For the following variables in 'varnames' no plots were generated:\n", paste(varnames_notused, collapse=", ")))
    datacov <- datacov[,suit_inds]
    varnames <- varnames[suit_inds]
  }
  
  
  # Create the plots and store them in a list:
  
  ps <- list()
  for(i in 1:ncol(datacov)) {
    plot_title <- ""
    if (addtitles)
      plot_title <- varnames[i]
    p <- plotVar(datacov[,i], y_outcome, x_label=varnames[i], y_label=yvarname, plot_title=plot_title, plot_type=plot_type, plotit=plotit)
    if (plotit) {
      if(i < ncol(datacov))
        readline(prompt="Press [enter] for next plot.")
    }
    ps[[i]] <- p
  }
    
  names(ps) <- names(datacov)
  
  
  # Return the list of plots invisibly:
  
  invisible(ps)
  
}

Any scripts or data that you put into this service are public.

diversityForest documentation built on June 8, 2025, 1:23 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

diversityForest
Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

R/plotMcl.R
In diversityForest: Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

Defines functions plotMcl

Documented in plotMcl

Try the diversityForest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

diversityForest Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

R/plotMcl.R In diversityForest: Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

Defines functions plotMcl

Documented in plotMcl

Try the diversityForest package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

diversityForest
Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling

R/plotMcl.R
In diversityForest: Innovative Complex Split Procedures in Random Forests Through Candidate Split Sampling