knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
While you will probably work with GGenemy
's main feature, the shiny app launched by GGenemy()
, it is also beneficial to have a sound understanding of how the package's most important functions work. All of these functions are called during your workflow inside the app.
Start off by loading and attaching GGenemy
.
library(GGenemy)
diamonds11
datasetTo demonstrate how the core functions of the package work, we are going to use the diamonds11
dataset, which is included in GGenemy
. Load the dataset and make yourself familiar with it.
data("diamonds11") head(diamonds11)
As you can see, the data is almost identical to diamonds
from ggplot2
, but includes an additional column called area
. This variable is a factor that indicates where the diamonds were found. Each category represents a country, namely
Since the variable is not yet saved as a factor, we have to correct this before moving on.
diamonds11$area <- as.factor(diamonds11$area)
describe()
Now that we have finished the groundwork, we can take a closer look at the data structure and some unconditional summary statistics of diamonds11
. This can easily be done with the describe()
function. It takes a dataset as its first argument and a character vector with a choice of summary statistics to display for numerical variables as a second argument.
describe(dataset, num.desc = c("min", "quantile0.25", "median", "mean", "quantile0.75", "max", "var", "sd", "valid.n"))
Using describe()
on diamonds11
gives us a deeper understanding of the dataset's numerical variables and factors.
describe(diamonds11, num.desc = c("min", "mean", "median", "max", "valid.n"))
describe()
calculates the choice of unconditional summary statistics for the numerics. For factors, it counts the occurences of each category, computes their share in the dataset and displays the mode for each factor.
So far, we have only considered unconditional summary statistics. But GGenemy
's main functionality is to tackle the issue of conditionality.
sum_stats()
The sum_stats()
function enables the calculation of conditional means, variances, skewness and kurtosis for numerical variables given another numeric variable. The command is of the following form:
sum_stats(dataset, given_var, stats = c("mean", "var", "skewness", "kurtosis"), n_quantiles = 5)
Its first argument is a dataset, which can include numerics, factors and logical variables. However, factors and logicals will be ignored for the conditional calculation. As a second input, you need to specify the name of a numerical variable from the dataset as a string. This will be the variable the summary statistics are conditoned on. Thirdly, the stats
argument controls which conditional statistics will be computed. You can specify your choice either with a character vector (see above), or with a numerical vector with a (sub)set of c(1, 2, 3, 4)
.
Lastly, n_quantiles
defines how many quantiles the given_var
will be split into. You can choose any integer between 1 and 10.
Executing the command for diamonds11
results into the following output:
diamonds_stats <- sum_stats(diamonds11, "carat", stats = c("mean", "var"), n_quantiles = 5) diamonds_stats
However, this output is not particularly nice to look at. You can fix this by using print_sum_stats(x, given_var)
, which takes an object created by sum_stats()
as its first argument and the name of the given variable as the second.
print_sum_stats(diamonds_stats, "carat")
plot_sum_stats()
While it is great to be able to compute conditional summary statistics with sum_stats()
, the amount of information can be difficult to take in. plot_sum_stats()
provides a graphical alternative to the tables, which displays the values for each variable in a line plot. The function takes the same arguments as sum_stats()
.
plot_sum_stats(diamonds11, "carat", stats = c("mean", "var"), n_quantiles = 5)
The last main function integrated in GGenemy
is called plot_GGenemy()
. It enables the user to investigate the conditional distributions of both factors and numerical variables. It's syntax is a little more extensive than that of the previous functions:
plot_GGenemy(dataset, given_var, var_to_plot = NULL, n_quantiles = 5, boxplot = FALSE, selfrange = NULL, remaining = TRUE)
While dataset
, given_var
and n_quantiles
are used in the same way as earlier, there are some new arguments that can be specified:
plot_GGenemy()
var_to_plot
allows you to state which variables you want to create outputs for. Per default, there will be one plot for each variable, regardless of its class.boxplot = FALSE
indicates that none of the numerical variables should be depicted as a boxplot instead of a density. If set to TRUE
, all numerics will be shown as boxplots. You can also specify a selective character vector with the names of the numerical variables you want to create boxplots for.Here are some examples on how to work with plot_GGenemy()
:
price
is plotted with its conditional density.plot_GGenemy(diamonds11, "carat", var_to_plot = "price")
price
is printed as a boxplot.plot_GGenemy(diamonds11, "carat", var_to_plot = "price", boxplot = TRUE)
selfrange
and remaining
selfrange = NULL
: per default, the given_var
will be split into n_quantiles
which contain equal amounts of data. If selfrange
is set to a vector or a matrix that specifies individualized ranges, n_quantiles
will be ignored and the selfrange
s will be used instead.remaining = TRUE
: when setting selfrange
s, the remaining
argument defines whether all data points not included in the ranges should be collected in an additional subset or be excluded completely.Here are some more examples for the usage of the advanced arguments:
plot_GGenemy(diamonds11, "carat", var_to_plot = "price", selfrange = c(0, 1, 1, 3, 3, 5), remaining = FALSE)
remaining
quantileplot_GGenemy(diamonds11, "carat", var_to_plot = "price", selfrange = c(0, 1, 1, 2, 2, 3), remaining = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.