knitr::opts_chunk$set(echo = FALSE)
Supervisor:
+-------------+------------------------------+-------+----------------------+ | Uli Niemann | Otto von Guericke University | M.Sc. | uli.niemann\@ovgu.de | +-------------+------------------------------+-------+----------------------+
Team Members:
+--------------+------------------------------+-------+--------------------------+ | Alisha Mehta | Otto von Guericke University | M-DKE | alisha.mehta\@st.ovgu.de | +--------------+------------------------------+-------+--------------------------+ | Ashish Soni | Otto von Guericke University | M-DKE | ashish.soni\@st.ovgu.de | +--------------+------------------------------+-------+--------------------------+
ggplot2
[@wickham2016ggplot2] is the most popular R package for data visualizations. It is an implementation of the "grammar of graphics" [@wilkinson2012grammar], a tool to concisely describe the components of a graphic by decomposing it into multiple layers. For example, a scatterplot can be thought of as a combination of four layers: points, axes, a coordinate system, text annotations, e.g. the plot title. As most of these layers are plot type agnostic, ggplot2
allows to easily combine multiple generic layers into a custom data visualization. As a result, ggplot2
provides more flexibility compared to R's inbuilt plot functions like hist()
or boxplot()
.
Various extensions^1 have been implemented, including "geoms" (geometric objects) for Sankey diagrams,tree maps, mosaic charts, and radar charts. The goal of this project was to design and implement^2 a geom
layer for the radial barchart visualization.
These geoms
are the fundamental building blocks of ggplot2
and represent what is actually visible in the plot. They can be divided into individual and collective geoms. An individual geom draws a distinct graphical object for each observation (row). For example, the geom_ponit()
point draws one point per row. A collective geom displays multiple observations with one geometric object. For example, a result of a statistical summary, like geom_boxplot()
. Lines and paths fall somewhere in between: each line is composed of a set of straight segments, but each segment represents two points [@wickham2016ggplot2].
ggplot2
uses ggproto
system of object oriented programming, introduced in ggplot2
v2.0.0. to facilitate portable extension classes. ggproto
style guide below :
ggproto
classes are used selectively - ggproto
objects are never created from scratch but rather you; subclass one of the main ggproto
classes provided by ggplot2
.
ggproto
classes are stateless, which means after they are constructed they will not change.
ggproto
classes have simple inheritance- as they are stateless, it is possible to call methods from other classes inside a method, instead of inheriting directly from the class.\newpage
The visualization was proposed in the research paper : Phenotyping chronic tinnitus patients using self report questionnaire data: cluster analysis and visual comparison. To dig deeper, read more about the research in [@Niemann:SREP_Pheno2020], cf. Figure \@ref(fig:radial-barchart).
(ref:radial-barchart) Radial barcharts visualizing two clusters of tinnitus patients. (From [@Niemann:SREP_Pheno2020]).
knitr::include_graphics("radial-barchart.png")
The barchart provides a graphical representation of a subpopulation. In particular, the height of a bar depicts the subpopulation average for a feature. Each feature is z-score normalized. The radial spatial layout distributes the bars around a circle where each bar starts at the black 0 line which represents the feature average over the whole population. Due to feature scaling, bars inclined to the outside represent feature averages above the overall population mean and bars inclined to the inside represent feature averages below the population mean.
This interpretation can optionally be visually supported by color-coded bars. Feature names can optionally be shown on top of each bar. All values are depicted in terms of standard deviation away from the population mean. For example, a value of -1 indicates that the subpopulation average is 1 standard deviation smaller than the overall population average. The standard deviation within a subpopulation is represented as grey error lines facing the colored inner circle. To facilitate quick feature localization, a custom feature categorization can be provided (see inner circle), alongside the subpopulation title and the number of instances within that subpopulation.
A graphical overview of the cluster solutions and the visualizations, is available as an interactive demo at https://unmnn.de/phs/app/. Radar charts were augmented with interactive components: by hovering over a bar or a feature label, additional cluster summaries and compact feature descriptions are shown as tooltips. Clicking on a feature invokes an additional chart which shows the (normalised) distribution of the selected feature stratified by cluster, and if selected, also after treatment. Continuous features are shown using semi-transparent boxplots placed on violin plot layers whereas for nominal features, category proportions alongside their 95% confidence intervals are displayed as points and error lines, respectively [@Niemann:SREP_Pheno2020].\newpage
ggradialbar
The ggradialbar
package provides two geom layer functions namely geom_rbar
a standard geom and geom_rbar_interactive
an interactive geom, respectively. These geoms extend the functionality of ggplot2
for visualizing high dimensional clusters and their evolution.
We have created two geom subclasses named GeomRbar
and GeomRbarInteractive
. In these classes we have overridden the following three methods: setup_params()
,setup_data()
, draw_group()
.
setup_params()
and setup_data()
are used to do early checks and modifications of the parameters and data [@wickham2016ggplot2]. Warnings and abort messages are also communicated to the user, in case of inputs that are not understood by the Geom object.
draw_group()
method renders the plot.
The last part of the new class are the required_aes
and default_aes
field.
- required_aes
: is a character vector that gives names of aesthetics that the user must provide to the geom.
- default_aes
: defines the aesthetics that the geom understands.
Some of the newly defined parameters for the function layer include
cluster_assignment
- An Integer vector with the cluster membership assignment.cluster_idx
- A length-one integer vector. The Index of cluster of interest.cluster_phase
- A Character vector with the time point of the recordingcluster_name
- Name of the cluster.group_names
- A Character vector with group names of features which are displayed in the inner circle.unique_id
- A Numerical vector with unique identifiers for each observation. (optional)The ggproto
objects are abstracted away into a constructor function that make up the ggplot2
API.
To get more information, check out the following github repository \<https://github.com/Ashish-Soni08/ggradialbar>
\newpage
Limitations
Some of the limitations of the package are as follows: * The visualization does not support categorical features internally. The data transformation has to be done by the user. * The data has to be scaled
Future Scope
From our research, we can suggest the following ways in which the project can be improved upon:
Stat
that transforms the data, input by the user, before being passed to the geom.addin
for the package so the package is available in RStudio
via Addins menu.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.