library(tidyverse) library(PPBDS.data) library(learnr) library(shiny) library(ggthemes) library(viridis) library(nycflights13) knitr::opts_chunk$set(echo = FALSE, message = FALSE) options(tutorial.exercise.timelimit = 60, tutorial.storage="local")
``` {r name, echo=FALSE} question_text( "Student Name:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
## Email ### ``` {r email, echo=FALSE} question_text( "Email:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )
geom_point()
Let's begin by making this scatterplot using geom_point()
.
ggplot(data=qscores, mapping= aes(x=hours, y=enrollment, color=term)) +scale_y_sqrt()+geom_point(alpha=.5)
The data set used here is qscores
. Put hours
on the x axis and enrollment
on the y axis. Then map the term
variable to the color
aesthetic.
Now change the scale to square root using scale_y_sqrt()
.(Remember you need to add a layer here). Also, set alpha to 0.5.
Remember the alpha argument goes inside geom_point().
geom_jitter()
Let's make this scatterplot using geom_jitter()
.
ggplot(data=diamonds, mapping= aes(x=price, y=carat)) +scale_x_log10()+geom_jitter(alpha=.5, height= .25, size=1)+geom_smooth(method="lm")
Use the diamonds
tibble, and put carat
on the y-axis, and price
on the x-axis.
Nice...but our plot looks a little sloppy. Try to minimize our overplotting by setting size
to 1, height
to 0.25, and alpha
to 0.2.
That looks a bit better! Now, try adding a trendline using geom_smooth()
, with method
set to the lm
. Let's also use the scale_x_log10
function to make the scale of the plot logarithmic.
Good stuff. Now, use what you've learned to recreate the plot below. Make sure you setwidth
= 1, and alpha
= 0.2. Length
is recorded as the variable x
.
ggplot(data = diamonds) + geom_jitter(aes(x = x, y = price, color = carat), width = 1, alpha = 0.2) + labs(title = "Length and Price in Cut Diamonds", x = "Length (Millimeters)", y = "Price (US Dollars)")
# Note that this is a bit tricky. There are two --- completely separate! --- # meanings of `x` in this problem. First, `x` is the name of a variable in the # `diamonds` dataset which measures the length of each diamond. (There is # nothing wrong with use `x` as a variable name after all.) Second, `x` is the # name of the first argument to the `aes()` function. But you can still use # both of them because R is smart enough to keep the meanings separate because # of how it uses the position of each in the code you write.
geom_line()
Let's make this line plot using geom_line()
.
ggplot(data = economics, aes(x = date, y= unemploy)) + geom_line() + scale_y_log10()+ ylim(0,20000)
The line plot uses the data set economics
, Put date
on the x-axis and unemploy
on the y-axis. Change the scale on the y-axis of the graph below to log base 10.
ggplot(data = economics, aes(x = date, y = unemploy)) + geom_line()
Use scale_y_log10()
Great! Now add a limit to the y axis with a lower bound of 0 and an upper bound of 20,000.
Use ylim().
Make a line plot using the population
tibble, which annually records the populations of different nations. Run this example code for Canada.
population %>% filter(country == "Canada") %>% ggplot(aes(x = year, y = population)) + geom_line()
Nice! Now try making a plot using x = year
, y = population
but with data from "Australia".
population %>% filter(country == ...) %>% ggplot(aes(x = ..., y = ...)) + geom_line()
Great, now let's try plotting the data from both countries on the same graph. Let's also set the country
variable to color
.
population %>% filter(country == ... | country == ...) %>% ggplot(aes(x = ..., y = ..., color = ...)) + geom_line()
Great. Now, use what you've learned to recreate the plot below.
population %>% filter(country == "Germany" | country == "France" | country == "Italy") %>% ggplot(aes(x = year, y = population, linetype = country, color = country)) + geom_line() + labs(title = "Population in Selected European Countries", x = "Year", y = "Population")
Use `linetype = ...`
You are going to need to add 'linetype= country' in the 'aes' function.
geom_histogram()
Let's keep working with our qscores
histogram plot, which you can see below. See the message R gave us about bin widths? Change the bin width by setting the number of bins to 20.
ggplot(data = qscores, aes(x = rating)) + geom_histogram(fill = "red4", color = "white") + labs(title = "Course Rating Distribution", y = "Count", x = "Rating")
ggplot(data = qscores, aes(x = rating)) + geom_histogram(fill = "red4", color = "white") + labs(title = "Course Rating Distribution", y = "Count", x = "Rating")
Set the argument bins in geom_histogram() to 20.
Nice! Change the plot so that the bins are .25 wide. (To do this you must delete the bins argument you set in the previous question)
Set the argument binwidth in geom_histogram() to .25
Now facet the plot by term
and set the number of columns to 2.
Use facet_wrap() and set ncol to 2.
ggplot(data = nhanes, aes(x = height)) + geom_histogram(color = "white", fill = "black", bins = 30) + facet_wrap(~ gender, ncol = 1)
Recreate the above histogram. It was made with data set nhanes
with height
on the x axis and faceted by gender
.
Use facet_wrap() and set ncol to 1.
Well done. Finally, use what you've learned about geom_histogram()
to replicate the plot below.
mpg %>% ggplot(aes(x = hwy, color = class)) + geom_histogram(fill = "white", bins = 5, position = "dodge") + labs(title = "Highway Fuel Economy by Car Type", x = "Miles Per Gallon (Highway)", y = "Number of Car Models")
Within geom_histogram, use the arguments: fill= "white", bins = 5, position = "dodge"
geom_boxplot()
For data set diamonds
, create a boxplot with clarity
on the x axis and price
on the y axis
Try changing the scale of the y axis to log base 10
Use the scale_y_log10() function
Great! Now, use coord_flip()
to convert the plot into a horizontal boxplot.
Remember that you are adding a layer here
Great! Now, let's zoom in on our plot by setting ylim()
to the vector c(500, 7000)
. Remember that ylim()
should be placed inside of the coord_
function, in this case, coord_flip
.
geom_violin()
Let's make this plot using geom_violin()
.
ggplot(data = mpg, aes(x = class, y = cty)) + geom_violin() + geom_jitter(aes(color = hwy, shape = drv), height = 0, width = 0.2)+ labs(title= "Engine Size by Car Type")
Let's use the mpg
tibble to make a plot with geom_violin()
. Let's map our x-axis to class
, and our y-axis to displ
.
Good. Now, let's add some labels with labs()
. Let's title our plot "Engine Size by Car Type".
Nice Work. Now, let's use geom_jitter()
to add a new layer to our plot, and map color
to maufacturer
, so we can get a sense of how car brand has to do with engine size. Let's set height
to 0 and width
to 0.2.
Cool. Now use what you've learned to replicate the plot below.
ggplot(data = mpg, aes(x = class, y = cty)) + geom_violin() + geom_jitter(aes(color = hwy, shape = drv), height = 0, width = 0.2)
ggplot(data = mpg, aes(x = class, y = cty)) + geom_violin() + geom_jitter(aes(color = ..., shape = ...), height = ..., width = ...)
geom_bar()
Let's make the following dodged bar plot using geom_bar
.
ggplot(data = sps, aes(x = education, fill=sex)) + geom_bar(postion="dodge")
Using data set sps
, a public health experiment in Mexico, make a dodged bar plot with education
on the x axis. Map thesex
variable to the fill aesthetic.
Within the geom_bar() argument, set position = "dodge"
Let's make a barplot using trains
, a tibble that records attitudes towards immigration on a Boston train platform before and after an experiment. Map our x variable to att_start
.
Great. Now, map fill
to party
. Notice any patterns?
Good work. Now, let's use facet_wrap()
to divide our graph by the liberal
variable.
facet_wrap( ~ ...)
Well done. Now, use what you've learned to recreate the plot below.
ggplot(data = trains, mapping = aes(x = att_end, fill = gender)) + geom_bar() + facet_wrap( ~ treatment) + labs(title = "Attitudes Towards Immigration After Experimentation", x = "Attitude Towards Immigration (Higher Means More Conservative)", y = "Number of People")
geom_col()
Let's recreate the plot below using geom_col()
.
eng_qscores <- qscores %>% filter(department == "ENGLISH") ggplot(data = eng_qscores, aes(x = number, y = hours)) + geom_col() + theme(axis.text.x = element_text(size = 5)) + labs(title = "Hours of Work in Harvard English Classes")
Use the data set eng_qscores
which was made from the data set qscores
and was filtered to only include data where department
== ENGLISH
.
eng_qscores <- qscores %>% filter(department == "ENGLISH") ggplot(data = eng_qscores, aes(x = number, y = hours)) + geom_col() + labs(title = "Hours of Work in Harvard English Classes")
eng_qscores <- qscores %>% filter(department == "ENGLISH")
eng_qscores <- qscores %>% filter(department == "ENGLISH") ggplot(data = eng_qscores, aes(x = number, y = hours)) + geom_col() + labs(title = ...)
Let's make a plot recording the fuel efficiency of different Toyota models with the toyota_mpg
tibble (made from mpg
, and geom_col()
. Set x = hwy
, and y = cty
.
toyota_mpg <- mpg %>% filter(manufacturer == "toyota") ggplot(data = toyota_mpg, ...)
toyota_mpg <- mpg %>% filter(manufacturer == "toyota")
Nicely done. Because geom_col()
uses the stacked
position by default, it's lumping all the different car models with the same name on the same bar. Let's set our position
to dodge
, and continue. Let's also set color
to "white" to make a white outline around our bars.
toyota_mpg <- mpg %>% filter(manufacturer == "toyota")
geom_smooth()
Let's create this graph using geom_smooth()
.
ggplot(data = trains, mapping = aes(x =att_start, y = att_end, color= treatment)) + geom_smooth(method = "gam", alpha= .75)
The graph above uses the data set trains
, which includes data on an experiment about attitudes towards immigration. Using geom_smooth()
and setting method to"gam"
. Put att_start
on the x axis and att_end
on the y axis.
ggplot(data = ..., mapping = aes(x = ..., y = ...)) + geom_smooth(method = "...")
Nice! Now set the color
aesthetic to the treatment
variable and set alpha
to .75
.
Because color is an aesthetic, set it inside of aes(). Set alpha inside of geom_smooth().
Now let's use the iris
tibble, which records the dimensions of samples of iris flowers, and geom_smooth()
to make a plot with x = Petal.Length
, and y = Petal.Width
.
Nice. Now, let's map color
to Species
and set method to 'lm'
Good work. Now, use what you've learned to recreate the graph below. You are going to need to use the argument 'se= FALSE' in the geom_smooth() function.
ggplot(data = iris, mapping = aes(x = Sepal.Width, y = Sepal.Length, linetype = Species)) + geom_smooth(se = FALSE, color = "purple") + labs(title = "Sepal Width and Length in Iris Species", x = "Sepal Width", y = "Sepal Height")
geom_density()
Let's make a plot using geom_density()
and the qscores
tibble. Also, map linetype
to term
within aes()
.
Nice. Now, let's use labs()
to label our x-axis "Rating", and our y-axis "Density of Classes".
Let's also add an xlim()
function, to set the range of x values on our plot to (2.5, 5).
Good stuff. Now, use what you've learned about geom_density()
to recreate the plot below.
ggplot(data = qscores, aes(x = enrollment, color = term)) + geom_density() + xlim(0, 200) + labs(title = "Distribution of Students Enrolled in Harvard Classes", x = "Number of Students Enrolled", y = "Density of Classes")
Congrats on finishing your first Gov 50 tutorial! You're on your way to being a master in data visualization and wrangling! :)
submission_ui
submission_server()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.