plotMcl | R Documentation |
This function allows to visualise the (estimated) distributions of one or several variables for each of the classes of the outcomes. This allows to study how exactly variables of interest are associated with the outcome, which is crucial for interpretive purposes. Two types of visualisations are available: density plots and boxplots. See the 'Details' section below for further explanation.
plotMcl(
data,
yvarname,
varnames,
plot_type = c("both", "density", "boxplot")[1],
addtitles = TRUE,
plotit = TRUE
)
data |
Data frame containing the variables. |
yvarname |
Name of outcome variable. |
varnames |
Names of the variables for which plots should be created. |
plot_type |
Plot type, one of the following: "both" (the default), "density", "boxplot". If "density", |
addtitles |
Set to |
plotit |
This states whether the plots are actually plotted or merely returned as |
For the "density"
plots, kernel density estimates (obtained using the
density()
function from base R) of the within-class distributions are
plotted in the same plot using different colors and, depending on the number
of classes, different line types. To account for the different number of
observations per class, each density is multiplied by the proportion of
observations from that class. The resulting scaled densities can be interpreted
in terms of the local density of the observations from each class relative to
those from the other classes. For example, if a scaled density has the largest
value in a particular region, this can be interpreted as the respective class
being the most frequent in that region. Another example: If the scaled density
of class "A" is twice as large as the scaled density of class "B" in a particular
region, this can be interpreted to mean that there are twice as many observations
of class "A" as of class "B" in that region.
In the "density"
plots, only classes represented by at least two
observations are considered. If the number of classes is greater than 7,
the different classes are distinguished using both colors and line styles.
To indicate the absolute numbers of observations in the different regions,
the locations of the observations from the different classes are visualized
using a rug plot on the x-axis, using the same colors and line types as for
the density plots. If the number of observations is greater than 1,000, a
random subset of 1,000 observations is shown in the rug plot instead of all
observations for visual clarity.
The "boxplot"
plots show the (estimated) within-class distributions
side by side using boxplots. All classes are considered, even those represented
by only a single observation. For the plot_type="both"
option, which
displays both "density"
and "boxplot"
plots, the boxplots are
displayed using the same colors and (if applicable) line styles as the kernel
density estimates, for clarity. Boxplots of classes for which no kernel density
estimates were obtained (i.e., those of the classes represented by single
observations) are shown in grey.
Note that plots are only generated for those variables in varnames
that have at least as many unique values as there are outcome classes. For
categorical variables, the category labels are printed on the x- or y-axis
of the "density"
or "boxplot"
plots, respectively. The rug plots
of the "density"
plots are produced only for numeric variables.
A list returned invisibly. The list has length equal to the number of elements in varnames
.
Each element corresponds to one variable and contains a list of ggplot2
plots structured as in the output of plotVar
.
Roman Hornung
Hornung, R. (2022). Diversity forests: Using split sampling to enable innovative complex split procedures in random forests. SN Computer Science 3(2):1, <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s42979-021-00920-1")}>.
plot.multifor
, plotVar
## Not run:
## Load package:
library("diversityForest")
## Plot "density" and "boxplot" plots (default: plot_type = "both") for the
## first three variables in the "hars" dataset:
data(hars)
plotMcl(data = hars, yvarname = "Activity", varnames = c("tBodyAcc.mean...X",
"tBodyAcc.mean...Y",
"tBodyAcc.mean...Z"))
## Plot only the "density" plots for these variables:
plotMcl(data = hars, yvarname = "Activity",
varnames = c("tBodyAcc.mean...X", "tBodyAcc.mean...Y",
"tBodyAcc.mean...Z"), plot_type = "density")
## Plot the "density" plots for these variables, but without titles of the
## plots:
plotMcl(data = hars, yvarname = "Activity", varnames =
c("tBodyAcc.mean...X", "tBodyAcc.mean...Y", "tBodyAcc.mean...Z"),
plot_type = "density", addtitles = FALSE)
## Make density plots for these variables, but only save them in a list "ps"
## without plotting them ("plotit = FALSE"):
ps <- plotMcl(data = hars, yvarname = "Activity", varnames =
c("tBodyAcc.mean...X", "tBodyAcc.mean...Y",
"tBodyAcc.mean...Z"), plot_type = "density",
addtitles = FALSE, plotit = FALSE)
## The plots can be manipulated later by using ggplot2 functionalities:
library("ggplot2")
p1 <- ps[[1]]$dens_pl + ggtitle("First variable in the dataset") +
labs(x="Variable values", y="my scaled density")
p2 <- ps[[3]]$dens_pl + ggtitle("Third variable in the dataset") +
labs(x="Variable values", y="my scaled density")
## Combine both of the above plots:
library("ggpubr")
p <- ggarrange(p1, p2, ncol = 2)
p
## # Save as PDF:
## ggsave(file="mypathtofolder/FigureXY1.pdf", width=14, height=6)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.