If you attended the Introduction to R course with us, you will already be familiar with some of the basic ggplot2 concepts. This practical serves as a reminder on some of those concepts, whilst also introducing some new ones. If you didn't attend this course, use this practical as a introduction to the basic concepts of ggplot2. Some these of plots aren't particularly useful, we are just using them for illustration purposes.
\newthought{To begin with}, load the ggplot2 package^[The ggplot2 package is automatically installed with jrGgplot2.]
library("ggplot2")
\noindent Next we load the beauty data set:^[Details of the beauty data set can be found at the end of this practical.]
library("jrGgplot2") data(Beauty, package = "jrGgplot2")
\noindent When loading in data, it's always a good idea to carry out a sanity check. I tend to use the commands
head(Beauty) colnames(Beauty) dim(Beauty)
Scatter plots are created using the point geom. Let's start with a basic scatter plot
ggplot(data = Beauty) + geom_point(aes(x = age, y = beauty))
\noindent To save typing, we can also store the plot as a variable:^[In this practical, we are creating the plots in a slightly verbose way.]
g = ggplot(data = Beauty) g1 = g + geom_point(aes(x = age, y = beauty))
\noindent To view this plot, type g1
.
The arguments x
and y
are called aesthetics. For geom_point
,
these parameters are required. This particular geom has other aesthetics:
shape
, colour
, size
and alpha
.^[These aesthetics are
usually available for most geoms.] Here are some things to try out.
g + geom_point(aes(x = age, y = beauty, colour = gender))
\noindent or
g + geom_point(aes(x = age, y = beauty, colour = gender, alpha = evaluation))
Some aesthetics, like shape
must be discrete. So we have to transform the variable into a
character or factor - shape = factor(tenured)
.
- Are there any differences between numeric values like tenured
and
characters like gender
for some aesthetics? What happens if you convert
tenured
to a factor in the colour
aesthetic. For example, colour
= factor(tenured)
.
- What happens if you set colour
(or some other aesthetic) outside of
the aes
function? For example, compare
g + geom_point(aes(x = age, y = beauty, colour = "blue"))
\noindent to
g + geom_point(aes(x = age, y = beauty), colour = "blue")
colour = 2
. What happens if you put this argument outside of the aes
function?The box plot geom has the following aesthetics: x
, y
, colour
,
fill
, linetype
, weight
, size
and alpha
. We can create a
basic boxplot using the following commands:
g + geom_boxplot(aes(x = gender, y = beauty))
\noindent Similar to the point geom, we can add in aesthetics:
g + geom_boxplot(aes(x = gender, y = beauty, colour = factor(tenured)))
\noindent Why do you think we have to convert tenured
to a discrete factor?
As before, experiment with the different aesthetics. For some of the aesthetics, you will need to convert the continuous variables to discrete variables. For example, this will give an error:
g + geom_boxplot(aes(x = gender, y = beauty, colour = tenured))
\noindent while this is OK^[Why?]
g + geom_boxplot(aes(x = gender, y = beauty, colour = factor(tenured)))
\noindent Make sure you play about with the different aesthetics.
The key idea with ggplot2 is to think in terms of layers not in terms of plot "types".^[In the lectures we will discuss what this means.] For example,
g + geom_boxplot(aes(x = gender, y = beauty, colour = factor(tenured))) + geom_point(aes(x = gender, y = beauty))
geom_boxplot
and geom_point
function calls?geom_point
isn't that great. Try using
geom_jitter
^[We have a bit too much data for
geom_jitter
, but you get the point.]:g + geom_boxplot(aes(x = gender, y = beauty, colour = factor(tenured))) + geom_jitter(aes(x = gender, y = beauty))
The bar geom has the following aesthetics: x
, colour
, fill
,
size
, linetype
, weight
and alpha
. Here is a command to get
started:
g + geom_bar(aes(x = factor(tenured)))
factor(...)
.Beauty$dec = factor(signif(Beauty$age, 1))
\noindent then plot:
g = ggplot(data = Beauty) g + geom_bar(aes(x = gender, fill = dec))
\noindent We can adjust the layout of this bar plot using ggplot's position
adjustments. The five possible adjustments are listed in table 1. The default adjustment is stack
g + geom_bar(aes(x = gender, fill = dec), position = "stack")
g + geom_bar(aes(x = gender, fill = dec), position = "dodge")
\begin{table}[t] \centering \begin{tabular}{@{}ll@{}} \toprule Adjustment & Description \ \midrule \texttt{dodge} & Adjust position by overlapping to the side \ \texttt{fill} & Stack overlapping elements; standardise stack height\ \texttt{identity} & Do nothing \ \texttt{jitter} & Jitter points \ \texttt{stack} & Stack overlapping elements \ \bottomrule \end{tabular} \caption{Position adjustments - table 4.5 in the ggplot2 book.} \label{T1} \end{table}
\newpage
\begin{table}[!t] \centering \caption{The first five rows of the beauty data set. There are a total of 463 course evaluations.} \begin{tabular}{@{}llllll r@{.}l@{}} \toprule tenured & minority & age & evaluation & gender & students & \multicolumn{2}{l}{beauty} \ \midrule 0 & 1 & 36 & 4.3 & Female & 43 & 0&202 \ 1 & 0 & 59 & 4.5 & Male & 20 & -0&826 \ 1 & 0 & 51 & 3.7 & Male & 55 & -0&660 \ 1 & 0 & 40 & 4.3 & Female & 46 & -0&766 \ 0 & 0 & 31 & 4.4 & Female & 48 & 1&421 \ \bottomrule \end{tabular} \label{T2} \end{table}
This data set is from a study where researchers were interested in whether a lecturers' attractiveness affected their course evaluation.\cite{Hamermesh2003} This is a cleaned version of the data set and contains the following variables:
evaluation
- the questionnaire result.tenured
- does the lecturer have tenure; 1 == Yes. In R, this value
is continuous.minority
- does the lecturer come from an ethnic minority (in the USA).age
.gender
.students
- number of students in the class.beauty
- each of the lecturers’ pictures was rated by six
undergraduate students: three women and three men. The raters were told to use
a 10 (highest) to 1 rating scale, to concentrate on the physiognomy of the
professor in the picture, to make their ratings independent of age, and to
keep 5 in mind as an average. The scores were then normalised.Table 2 gives the first few rows of the data set.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.