knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = TRUE ) library(dplyr) library(magrittr)
We will need the package ggplot2
:
ggplot2
is installedlibrary(ggplot2)
We also need the "fruits" data:
data("fruits", package = "ReMUSE")
{width=50%}
{width=100%}
The base function for bar plots is barplot
:
barplot(table(fruits$groupe))
With colors:
barplot(table(fruits$groupe), col = 1:4)
ggplot(data = fruits, aes(x = groupe, fill = groupe)) + geom_bar()
STOP !
ggplot
: create an empty canvasaes
: declare aesthetic parameters (position, color, width, shape, opacity, etc...)geom_bar
: use a geometry{width=100%}
{width=100%}
{width=100%}
{width=80%}
{width=100%}
Data data
The data used to create the graph. Each line represents an object to add to the graph.
Geometry geom_
How to represent the objects: point, lines, surfaces etc.
Aesthetics aes()
Aesthetic parameters of the shapes: position, color, shape, size etc.
Scale scale_
Functions used to parameter how the shapes are created from the objects and the aesthetic parameters. For example the function scale_color_manual
allows the users to pick their own colors.
Reproduce the graph on the right:
ggplot(***, aes(***, fill = Sucres > 10)) + geom_***()
{width=100%}
ggplot
"1" (see ici){width=50%}
We are going to see together some particular geometries used to create "classic" graphs.
geom_bar
Bar plot on non-aggregated data
geom_col
Bar plot on existing counts
geom_histogram
Histogram of a quantitative variable
geom_boxplot
Tukey diagram aka boxplot
geom_violin
"Violin" plot
geom_point
Scatter plot
geom_line
Line plot
We already know how to do it:
ggplot(fruits, aes(cut(Eau, c(0, 84.2, 100)))) + geom_bar(fill = "steelblue")
When you already have counts.
dat.count <- data.frame( Fruit = c("Ananas", "Durian"), Nb = c(10, 20) ) ggplot(data = dat.count, aes(x = Fruit, y = Nb)) + geom_col()
Add colors to the previous bar plot!
ggplot(fruits, aes(Sucres)) + geom_bar()
ggplot(fruits, aes(Sucres)) + geom_histogram()
To plot counts for :
To plot counts or densities for:
In this case, it is very important to choose the intervals!
{width=80%}
ggplot(fruits, aes(Sucres)) + geom_histogram()
To create a histogram, one needs to distribute values into classes.
hist
does it automatically with an algorithm (Sturges by default, but the user can use Scott, or Friedman-Diaconis algorithms). If n
is specified, the function will choose a close value for n
that gives pretty intervals. To force the classes, use breaks
.geom_histogram
creates 30 classes by default, it is the user's job to specify their classes or the number of classes they want.ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5))
ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5))
ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5), fill = "steelblue")
ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5), fill = "steelblue")
ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5), fill = "steelblue", color = "white")
ggplot(fruits, aes(Sucres)) + geom_histogram(breaks = seq(0, 75, 5), fill = "steelblue", color = "white")
{width=100%}
ggplot(data=fruits, aes(x = Sucres)) + geom_boxplot()
ggplot(data=fruits, aes(x=groupe, y=Sucres)) + geom_boxplot()
ggplot(data=fruits, aes(x = Sucres, y = 1)) + geom_violin()
ggplot(data=fruits, aes(x = groupe, y = Sucres)) + geom_violin()
Complete the code to obtain the graph on the right:
ggplot(fruits, aes(x = Fibres > 1.5, y = Proteines, fill = ***)) + geom_***()
{width=100%}
Themes are pre-defined functions that change the appearance of ggplots:
Examples (theme_***()
) :
theme_bw()
for a black and white theme,theme_minimal()
for a minimalist theme,theme_void()
for an empty themetheme_bw()
ggplot(fruits, aes(Fibres)) + geom_histogram() + theme_bw()
theme_minimal()
ggplot(fruits, aes(Fibres)) + geom_histogram() + theme_minimal()
theme_void()
ggplot(fruits, aes(Fibres)) + geom_histogram() + theme_void()
theme_bw
with the command ?theme_bw
ggplot(fruits, aes(y = Fibres)) + geom_boxplot() + theme_***()
{width=100%}
ggtitle
xlab
ylab
... or use the wrapper function labs
to go even faster:
labs( title = "Titre du graphe", subtitle = "Sous-titre du graphe", x = "Titre de l'axe des x", y = "Titre de l'axe des y", color = "Titre de la légende des couleurs", shape = "Titre de la légende des formes" )
With the function theme()
: each element has to be defined according to its nature.
element_text(size=, colour = "", family = "")
(e.g. titles)element_line(colour=“”, size=)
(e.g. major and minor grids)element_rect(fill = "")
(e.g.: background)theme()
axis.title
, axis.title.x
, axis.title.y
: size, font, color, ...axis.text
, axis.text.x
, axis.text.y
: size, font, color, ...axis.ticks
, axis.ticks.x
, axis.ticks.y
axis.line
, axis.line.x
, axis.line.y
panel.background
: color panel.grid.major
, panel.grid.minor
: color, sizelegend.text
: size, font, colorlegend.position
plot.title
: size, font, colorgeom_point
This geometry needs $x$ et $y$ aesthetic parameters, and will accept optionally size, color and shape.
ggplot(fruits, aes(x = Phosphore, y = Calcium, size = Magnesium)) + geom_point()
When they are specified in aes
, they apply values (from the dataset) to a characteristic of the objects that are drawn on the graph.
color
or colour
: color (of the point)fill
: color (inside a shape)size
: sizeshape
: shapealpha
: opacitylinetype
: type of linelabel
: labelsSpecified outside of aes()
, they behave in a more general way!
ggplot(fruits, aes(x = Phosphore, y = Calcium, color = Magnesium)) + geom_point() + theme(legend.position = "bottom")
ggplot(fruits, aes(x = Phosphore, y = Calcium)) + geom_point(color = "limegreen")
Complete the code to obtain the graph on the right:
ggplot(fruits, aes(x = Sucres, y = Proteines, *** = Magnesium, *** = ***)) + geom_***() + ***(title = "Fruits", x = "Sucres (g/100 g)", y = "Protéines, N x 6.25 (g/100 g)", size = "Magnésium\n(mg/100 g)", ***= "Groupe") + theme_***()
{width=100%}
Don't panick, use opacity (aka alpha
) :
ggplot(fruits, aes(x = Phosphore, y = Calcium, color = groupe)) + geom_point(alpha = 0.5, size = 2) + theme_bw() + theme(legend.position = "bottom")
ggplot(fruits, aes(x = Phosphore, y = Calcium, color = groupe)) + geom_point(alpha = 0.5, size = 2) + theme_bw() + theme(legend.position = "bottom")
scale_***
functions {.smaller}They allow the use to customize a scale (in $x$ or $y$ but not only)!
scale_x_log10()
changes the $x$ scale to a logarithmic scale,scale_y_log10()
changes the $y$ scale to a logarithmic scale,scale_color_manual()
customizes the colors,scale_fill_manual()
customizes the colors inside shapes,scale_x_continuous()
customizes the $x$ scale for a continuous variable,scale_y_continuous()
customizes the $y$ scale for a continuous variable,scale_x_discrete()
customizes the $x$ scale for a discrete variable,,scale_y_discrete()
customizes the $y$ scale for a discrete variable,,Complete the code to obtain the graph on the right:
ggplot(fruits, aes(Phosphore, Calcium)) + geom_point(*** = "white") + scale_***() + scale_***() + labs(x = "log10(Phosphore)", y = "log10(Calcium)") + theme_dark()
{width=100%}
coord_***
functionsThey allow the user to change the coordinate system after applying all the scaling transformations (with scale_***
functions). For example:
coord_fixed
to fix the ratio between the units on the $y$ axis and the units on the $x$ axis,coord_equal
when the ratio is set to 1,coord_flip
to flip the axes,coord_polar
to get a plot in the polar coordinate system. *lim*
functionsThat allow the users to specify the limits (minimum and maximum) on a specified axis. Caution: the values outside are eliminated from the graph!
xlim
, ylim
or lims
to change ghe range,expand_limits
to extend the range.To "zoom in" without loosing data, use coord_cartesian
or scale_***
facet_wrap
Used to divide the graph into panels.
Careful about the syntax: it is based on vars
.
To divide a graphe g
into several panels according to the value of a factor fac
:
g + facet_wrap(facets = vars(fac))
One can also use a "formula" :
g + facet_wrap(~ fac)
ggplot(fruits, aes(x = Phosphore, y = Calcium, color = groupe)) + geom_point() + facet_wrap(vars(Sucres > 10)) + theme_bw() + theme(legend.position = "bottom")
ggplot(fruits, aes(x = Phosphore, y = Calcium, color = groupe)) + geom_point() + facet_wrap(vars(Sucres > 10)) + theme_linedraw() + theme(legend.position = "bottom")
facet_grid
That is used the same way as facet_wrap
.
To divide a graph g
into several panels according to the value of a factor factorow
for the lines and factocol
for the columns:
g + facet_grid(rows = vars(factorow), cols = vars(factocol))
One can also use a "formula":
g + facet_grid(factorow ~ factocol)
A PIECE OF ADVICE: when using faceting, be careful about the levels of the categorical variables that your are going to use.
Use and example:
g <- ggplot(fruits, aes(groupe)) + geom_bar() ggsave(filename = "mongraphe.png", plot = g)
The extension given in filename
will be magically used to save the graph in the correct format!
gplot2
is very complete :
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.