knitr::opts_chunk$set(echo = FALSE)

Supervisor:

+-------------+------------------------------+-------+----------------------+ | Uli Niemann | Otto von Guericke University | M.Sc. | uli.niemann\@ovgu.de | +-------------+------------------------------+-------+----------------------+

Team Members:

+--------------+------------------------------+-------+--------------------------+ | Alisha Mehta | Otto von Guericke University | M-DKE | alisha.mehta\@st.ovgu.de | +--------------+------------------------------+-------+--------------------------+ | Ashish Soni | Otto von Guericke University | M-DKE | ashish.soni\@st.ovgu.de | +--------------+------------------------------+-------+--------------------------+

Problem Description

ggplot2 [@wickham2016ggplot2] is the most popular R package for data visualizations. It is an implementation of the "grammar of graphics" [@wilkinson2012grammar], a tool to concisely describe the components of a graphic by decomposing it into multiple layers. For example, a scatterplot can be thought of as a combination of four layers: points, axes, a coordinate system, text annotations, e.g. the plot title. As most of these layers are plot type agnostic, ggplot2 allows to easily combine multiple generic layers into a custom data visualization. As a result, ggplot2 provides more flexibility compared to R's inbuilt plot functions like hist() or boxplot().

Various extensions^1 have been implemented, including "geoms" (geometric objects) for Sankey diagrams,tree maps, mosaic charts, and radar charts. The goal of this project was to design and implement^2 a geom layer for the radial barchart visualization.

These geoms are the fundamental building blocks of ggplot2 and represent what is actually visible in the plot. They can be divided into individual and collective geoms. An individual geom draws a distinct graphical object for each observation (row). For example, the geom_ponit() point draws one point per row. A collective geom displays multiple observations with one geometric object. For example, a result of a statistical summary, like geom_boxplot(). Lines and paths fall somewhere in between: each line is composed of a set of straight segments, but each segment represents two points [@wickham2016ggplot2].

ggplot2 uses ggproto system of object oriented programming, introduced in ggplot2 v2.0.0. to facilitate portable extension classes. ggproto style guide below :

  1. ggproto classes are used selectively - ggproto objects are never created from scratch but rather you; subclass one of the main ggproto classes provided by ggplot2.

  2. ggproto classes are stateless, which means after they are constructed they will not change.

  3. ggproto classes have simple inheritance- as they are stateless, it is possible to call methods from other classes inside a method, instead of inheriting directly from the class.\newpage

Background

The visualization was proposed in the research paper : Phenotyping chronic tinnitus patients using self report questionnaire data: cluster analysis and visual comparison. To dig deeper, read more about the research in [@Niemann:SREP_Pheno2020], cf. Figure \@ref(fig:radial-barchart).

(ref:radial-barchart) Radial barcharts visualizing two clusters of tinnitus patients. (From [@Niemann:SREP_Pheno2020]).

knitr::include_graphics("radial-barchart.png")

The barchart provides a graphical representation of a subpopulation. In particular, the height of a bar depicts the subpopulation average for a feature. Each feature is z-score normalized. The radial spatial layout distributes the bars around a circle where each bar starts at the black 0 line which represents the feature average over the whole population. Due to feature scaling, bars inclined to the outside represent feature averages above the overall population mean and bars inclined to the inside represent feature averages below the population mean.

This interpretation can optionally be visually supported by color-coded bars. Feature names can optionally be shown on top of each bar. All values are depicted in terms of standard deviation away from the population mean. For example, a value of -1 indicates that the subpopulation average is 1 standard deviation smaller than the overall population average. The standard deviation within a subpopulation is represented as grey error lines facing the colored inner circle. To facilitate quick feature localization, a custom feature categorization can be provided (see inner circle), alongside the subpopulation title and the number of instances within that subpopulation.

A graphical overview of the cluster solutions and the visualizations, is available as an interactive demo at https://unmnn.de/phs/app/. Radar charts were augmented with interactive components: by hovering over a bar or a feature label, additional cluster summaries and compact feature descriptions are shown as tooltips. Clicking on a feature invokes an additional chart which shows the (normalised) distribution of the selected feature stratified by cluster, and if selected, also after treatment. Continuous features are shown using semi-transparent boxplots placed on violin plot layers whereas for nominal features, category proportions alongside their 95% confidence intervals are displayed as points and error lines, respectively [@Niemann:SREP_Pheno2020].\newpage

Description of the Package : ggradialbar

The ggradialbar package provides two geom layer functions namely geom_rbar a standard geom and geom_rbar_interactive an interactive geom, respectively. These geoms extend the functionality of ggplot2 for visualizing high dimensional clusters and their evolution.

We have created two geom subclasses named GeomRbar and GeomRbarInteractive. In these classes we have overridden the following three methods: setup_params(),setup_data(), draw_group().

  1. setup_params() and setup_data() are used to do early checks and modifications of the parameters and data [@wickham2016ggplot2]. Warnings and abort messages are also communicated to the user, in case of inputs that are not understood by the Geom object.

  2. draw_group() method renders the plot.

The last part of the new class are the required_aes and default_aes field. - required_aes: is a character vector that gives names of aesthetics that the user must provide to the geom. - default_aes : defines the aesthetics that the geom understands.

Some of the newly defined parameters for the function layer include

The ggproto objects are abstracted away into a constructor function that make up the ggplot2 API.

To get more information, check out the following github repository \<https://github.com/Ashish-Soni08/ggradialbar>

\newpage

Conclusion

Limitations

Some of the limitations of the package are as follows: * The visualization does not support categorical features internally. The data transformation has to be done by the user. * The data has to be scaled

Future Scope

From our research, we can suggest the following ways in which the project can be improved upon:

References {.unnumbered}



Ashish-Soni08/ggradialbar documentation built on April 15, 2021, 4:11 a.m.