$\$

# makes sure you have all the packages we have used in class installed
#SDS230::update_installed_packages()


# install.packages("gapminder")
# install.packages("latex2exp")

library(latex2exp)

library(dplyr)


#options(scipen=999)


knitr::opts_chunk$set(echo = TRUE)

set.seed(123)

$\$

Overview

$\$

Part 1: Visualizations with ggplot

We can use the ggplot2 package, which is part of the tidyverse, to create much nicer looking graphics than using base R graphics. The ggplot2 library is modeled on Leland Wilkinson's "grammar of graphics" which creatse graphics from a combination of basic visual elements.

In the exercises below, we will learn how to use ggplot using the motor trends cars data set (mtcars) that comes with base R installation, and also the gapminder data.

A few resources to learn more about ggplot are:

$\$

Part 1.1: Scatter plots

Let's create plots of the number of miles per gallon (mpg) cars get as a function of the weight of the car.

# install.packages("ggplot2")

library(ggplot2)



# base R





# ggplot 






# ggplot - mapping in the geom

$\$

Part 1.2: Adding labels to plots

We can add labels to the plots using the labs() function. Arguments to labs() function include: - x: the label on the x-axis (you can also use the xlab() function) - y: the label on the y-axis (you can also use the ylab() function) - title: the title of the plot (you can also use the ggtitle() function) - subtitle: the title of the plot (you can also use the ggtitle() function)

If you just want to add x and y labels or a title, you can also use the xlab("label1"), ylab("label2") and/or the ggtitle() functions.


Remember, if you don't want exes label your axes

$\$

Part 1.3: Adding annotations to plots

We can add annotations to plots using the annotate("text", x = , y = , label = ) function.


$\$

Part 1.4: More aesthetic mappings

We can use other aesthetic mappings beyond position including: - color: different color scales are used for quantitative and categorical (factor) data - shape: should be categorical data - size: should be quantitative data

# add color based on the transmission type (is automatic or not)




# it is better to treat the transmission type as a categorical variable?





# can also try mapping transmission type to shape or size 

Question: When adding the variable automatic/manual transmission (am) to the scatter plot mpg vs. weight, do you think it is best to map am on to...?

a. color b. shape c. size

$\$

Part 1.5: Attributes vs. Aesthetics

Setting an aesthetic mapping maps a variable to a glyph property. This is done inside the aes() function.

Setting an attribute set a glyph property to a fixed value. This is done outside the aes() function.

# setting an aesthetic mapping




# setting an attribute

$\$

Part 1.6: Facets

Beyond comparing variables based on aesthetic mappings, you can compare categorical variables by splitting a plot into subplots, called facets, using facet_wrap()

# separate subplots for the two transmission types





# One can also do facets in two dimensions

$\$

Part 1.7: Overplotting

Sometimes points overlap making it hard to estimate the number of points at a particular range of values.

We can control the transparency of points by changing their alpha values.

library(gapminder)


# a lot of overplotting




# changing the transparency levels makes it a bit easier to see how many points are at a given x, y location

$\$

Part 1.8: Changing scales

Each visual attribute that has an aesthetic mapping has a default scales. We can change the scales used for each mapping using functions that start with scale_.

For example, we can change the x-scale from liner to logarithmic using scale_x_continuous(trans='log10'). Likewise we can change the color scale using scale_color_manual().

# changing the scale on the x-axis





# mapping continents to colors, and adding my own color scale

$\$

Part 1.9: geoms

We can use different geoms to create other types of plots.

# Let's create a plot that shows the GDP in the United States as a function of the year using the geom geom_line()





# Let's create a plot that shows the GDP in the United States as a function of the year using the geom geom_col()





# Let's MPG as a function of weight using the names of cars rather than just points for each car





# Let's plot a histogram of the weights of cars




# Let's create a boxplot of the weights of cars



# Let's create a side-by-side boxplot of the weights of cars depending on the number of cylinders the engine has

$\$

Part 1.10: geoms continued

Violin and Joy plots are other ways to view distributions of data

# violin plot 




library("ggridges")

# joy plot

Question: Can you figure out where the name "joy plot" comes from?

$\$

Part 1.11: geoms continued

We can also have multiple geom layers on a single graph by using the + symbol E.g ggplot(…) + geom_type1() + geom_type2()

# Create a scatter plot of miles per gallon as a function of weight and then add a smoothed line using geom_smooth() and a vertical lines using geom_vline()

$\$

Part 1.12: Themes

We can also use different themes to change the appearance of our plot.

# Add theme_classic() to our plot



# install.packages("ggthemes")


# library(ggthemes)

$\$

Part 1.13: Exeriment yourself

Try to create some interesting visualizations from either a data set we have used in the class, or a new data set you found.


$\$



emeyers/SDS230 documentation built on Jan. 18, 2024, 1:01 a.m.