plot_gausOverlayData | R Documentation |
To compare histograms, it is best practice to facet each (diagram) feature using ggplot. To know whether the distribution is similar to the normal gaussian distribution, an overlay with the gaussian distribution can help. Of corse the Q-Q Plot is a more appropriated tool to compare normal distribution and sample distribution.
plot_gausOverlayData(features, binwidth = 1, ratio = 5, times_sd = 3)
features |
list of vectors (features) or data frame |
binwidth |
variable to define the bin width for the histogram, binwidth = 1 |
ratio |
variable to define how many more data points than bins are generated (resolution), ratio = 5 |
times_sd |
value range calculated by mean +/- |
The first argument is a (named) list of vectors or a (named) data frame to calculate the normal distribution for each vector, where as the name is representing the feature name and the facet in the histogram. If no names are provided the function paste("list_element",1:length(features)) will name the features. The second argument is the bin width in the histogram. The third argument is the multiplier of data points. How many more data points are generated than bins are ploted in the histogram. This ensures that for small bin count enough data points are available. To be able to generate a faceted plot the bin width is important to set equal in the ggplot call ggplot()+geom_histogram(aes(...),binwidth = 1). Check the examples for further information.
returns tibble with coordinates x, y and feature name
library(tidyverse)
# create some data to Test
x <- 1:1000
df_data <- tibble(y1 = rnorm(x, mean = 7.8, sd = 2.1),
y2 = rnorm(x, mean = -3, sd = 1.2))|>
bind_cols(tibble(y3 = rnorm(1:(length(x)-200), mean = 8, sd = 1.2))|>
bind_rows(tibble(y3 = rnorm(1:200, mean = 0, sd = 0.8))))
# create transformed version of data to plot
df_transformed <- df_data|>
pivot_longer(cols = starts_with("y"))|>
rename(feature = name)|>
mutate(feature = factor(feature))|>
group_by(feature)
# find the min and max values of all data an create a vector with defined bin size
binwidth = 1
c_bins <- seq(floor(min(df_transformed$value)),ceiling(max(df_transformed$value)),binwidth)
df_gaus <- plot_gausOverlayData(df_data,binwidth = binwidth, 10)
# facetted histogramms with normal distributed line overlay
df_transformed|>
ggplot()+
geom_histogram(aes(value), binwidth = binwidth,fill = "lightgreen", color = "black")+
geom_line(data = df_gaus, aes(x,y), color = "blue", linewidth = 1.2)+
facet_grid(feature~.)+
scale_x_continuous(breaks = c_bins)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.