library(tidyverse)
library(PPBDS.data)
library(learnr)
library(shiny)
library(ggthemes)
library(viridis)
library(nycflights13)

knitr::opts_chunk$set(echo = FALSE, message = FALSE)
options(tutorial.exercise.timelimit = 60, tutorial.storage="local")  

Name

``` {r name, echo=FALSE} question_text( "Student Name:", answer(NULL, correct = TRUE), incorrect = "Ok", try_again_button = "Modify your answer", allow_retry = TRUE )

## Email
###

``` {r email, echo=FALSE}
question_text(
  "Email:",
  answer(NULL, correct = TRUE),
  incorrect = "Ok",
  try_again_button = "Modify your answer",
  allow_retry = TRUE
)

Advanced Plotting - geom_point()

Let's begin by making this scatterplot using geom_point().

ggplot(data=qscores, mapping= aes(x=hours, y=enrollment, color=term)) +scale_y_sqrt()+geom_point(alpha=.5)

Exercise 1

The data set used here is qscores. Put hours on the x axis and enrollment on the y axis. Then map the term variable to the color aesthetic.


Exercise 2

Now change the scale to square root using scale_y_sqrt().(Remember you need to add a layer here). Also, set alpha to 0.5.


Remember the alpha argument goes inside geom_point().

Advanced Plotting - geom_jitter()

Let's make this scatterplot using geom_jitter().

ggplot(data=diamonds, mapping= aes(x=price, y=carat)) +scale_x_log10()+geom_jitter(alpha=.5, height= .25, size=1)+geom_smooth(method="lm")

Exercise 1

Use the diamonds tibble, and put carat on the y-axis, and price on the x-axis.


Exercise 2

Nice...but our plot looks a little sloppy. Try to minimize our overplotting by setting size to 1, height to 0.25, and alpha to 0.2.


Exercise 3

That looks a bit better! Now, try adding a trendline using geom_smooth(), with method set to the lm. Let's also use the scale_x_log10 function to make the scale of the plot logarithmic.


Exercise 4

Good stuff. Now, use what you've learned to recreate the plot below. Make sure you setwidth = 1, and alpha = 0.2. Length is recorded as the variable x.

ggplot(data = diamonds) +
  geom_jitter(aes(x = x, y = price, color = carat), width = 1, alpha = 0.2) +
  labs(title = "Length and Price in Cut Diamonds",
       x = "Length (Millimeters)",
       y = "Price (US Dollars)")

 # Note that this is a bit tricky. There are two --- completely separate! ---
 # meanings of `x` in this problem. First, `x` is the name of a variable in the
 # `diamonds` dataset which measures the length of each diamond. (There is
 # nothing wrong with use `x` as a variable name after all.) Second, `x` is the
 # name of the first argument to the `aes()` function. But you can still use
 # both of them because R is smart enough to keep the meanings separate because
 # of how it uses the position of each in the code you write.

Advanced Plotting - geom_line()

Let's make this line plot using geom_line().

ggplot(data = economics, aes(x = date, y=    unemploy)) + geom_line() + scale_y_log10()+ ylim(0,20000)

Exercise 1

The line plot uses the data set economics, Put date on the x-axis and unemploy on the y-axis. Change the scale on the y-axis of the graph below to log base 10.

ggplot(data = economics, aes(x = date, y = unemploy)) +
  geom_line() 
Use scale_y_log10()

Exercise 2

Great! Now add a limit to the y axis with a lower bound of 0 and an upper bound of 20,000.


Use ylim().

Exercise 3

Make a line plot using the population tibble, which annually records the populations of different nations. Run this example code for Canada.

population %>% 
  filter(country == "Canada") %>% 
  ggplot(aes(x = year, y = population)) +
    geom_line()

Exercise 4

Nice! Now try making a plot using x = year, y = population but with data from "Australia".

population %>% 
  filter(country == ...) %>% 
  ggplot(aes(x = ..., y = ...)) +
    geom_line()

Exercise 5

Great, now let's try plotting the data from both countries on the same graph. Let's also set the country variable to color.

population %>% 
  filter(country == ... | country == ...) %>% 
  ggplot(aes(x = ..., y = ..., color = ...)) +
    geom_line()

Exercise 6

Great. Now, use what you've learned to recreate the plot below.

population %>% 
  filter(country == "Germany" | country == "France" | country == "Italy") %>% 
  ggplot(aes(x = year, 
             y = population, 
             linetype = country, 
             color = country)) +
    geom_line() +
    labs(title = "Population in Selected European Countries",
       x = "Year",
       y = "Population")
Use `linetype = ...`

You are going to need to add 'linetype= country' in the 'aes' function.


Advanced Plotting - geom_histogram()

Exercise 1

Let's keep working with our qscores histogram plot, which you can see below. See the message R gave us about bin widths? Change the bin width by setting the number of bins to 20.

ggplot(data = qscores, aes(x = rating)) +
  geom_histogram(fill = "red4", color = "white") +
  labs(title = "Course Rating Distribution", y = "Count", x = "Rating")
ggplot(data = qscores, aes(x = rating)) +
  geom_histogram(fill = "red4", color = "white") +
  labs(title = "Course Rating Distribution", y = "Count", x = "Rating")
Set the argument bins in geom_histogram() to 20.

Exercise 2

Nice! Change the plot so that the bins are .25 wide. (To do this you must delete the bins argument you set in the previous question)


Set the argument binwidth in geom_histogram() to .25

Exercise 3

Now facet the plot by term and set the number of columns to 2.


Use facet_wrap() and set ncol to 2.

Exercsie 4

ggplot(data = nhanes, aes(x = height)) +
    geom_histogram(color = "white", fill = "black", bins = 30) +
    facet_wrap(~ gender, ncol = 1)

Recreate the above histogram. It was made with data set nhanes with height on the x axis and faceted by gender.


Use facet_wrap() and set ncol to 1.

Exercise 5

Well done. Finally, use what you've learned about geom_histogram() to replicate the plot below.

mpg %>% ggplot(aes(x = hwy, color = class)) +
  geom_histogram(fill = "white", bins = 5, position = "dodge") +
  labs(title = "Highway Fuel Economy by Car Type",
       x = "Miles Per Gallon (Highway)",
       y = "Number of Car Models")

Within geom_histogram, use the arguments: fill= "white", bins = 5, position = "dodge"

Advanced Plotting - geom_boxplot()

Exercise 1

For data set diamonds, create a boxplot with clarity on the x axis and price on the y axis


Exercise 2

Try changing the scale of the y axis to log base 10


Use the scale_y_log10() function

Exercise 3

Great! Now, use coord_flip() to convert the plot into a horizontal boxplot.


Remember that you are adding a layer here

Exercise 4

Great! Now, let's zoom in on our plot by setting ylim() to the vector c(500, 7000). Remember that ylim() should be placed inside of the coord_ function, in this case, coord_flip.


Advanced Plotting - geom_violin()

Let's make this plot using geom_violin().

ggplot(data = mpg, aes(x = class, y = cty)) +
  geom_violin() +
  geom_jitter(aes(color = hwy, shape = drv), height = 0, width = 0.2)+ labs(title= "Engine Size by Car Type")

Exercise 1

Let's use the mpg tibble to make a plot with geom_violin(). Let's map our x-axis to class, and our y-axis to displ.


Exercise 2

Good. Now, let's add some labels with labs(). Let's title our plot "Engine Size by Car Type".


Exercise 3

Nice Work. Now, let's use geom_jitter() to add a new layer to our plot, and map color to maufacturer, so we can get a sense of how car brand has to do with engine size. Let's set height to 0 and width to 0.2.


Exercise 4

Cool. Now use what you've learned to replicate the plot below.

ggplot(data = mpg, aes(x = class, y = cty)) +
  geom_violin() +
  geom_jitter(aes(color = hwy, shape = drv), height = 0, width = 0.2)

ggplot(data = mpg, aes(x = class, y = cty)) +
  geom_violin() +
  geom_jitter(aes(color = ..., shape = ...), height = ..., width = ...)

Advanced Plotting - geom_bar()

Let's make the following dodged bar plot using geom_bar.

ggplot(data = sps, aes(x = education, fill=sex)) +
  geom_bar(postion="dodge")

Exercise 1

Using data set sps, a public health experiment in Mexico, make a dodged bar plot with education on the x axis. Map thesexvariable to the fill aesthetic.


Within the geom_bar() argument, set position = "dodge"

Exercise 2

Let's make a barplot using trains, a tibble that records attitudes towards immigration on a Boston train platform before and after an experiment. Map our x variable to att_start.


Exercise 3

Great. Now, map fill to party. Notice any patterns?


Exercise 4

Good work. Now, let's use facet_wrap() to divide our graph by the liberal variable.

facet_wrap( ~ ...)

Exercise 5

Well done. Now, use what you've learned to recreate the plot below.

ggplot(data = trains, mapping = aes(x = att_end, fill = gender)) +
  geom_bar() +
  facet_wrap( ~ treatment) +
  labs(title = "Attitudes Towards Immigration After Experimentation",
       x = "Attitude Towards Immigration (Higher Means More Conservative)",
       y = "Number of People")

Advanced Plotting - geom_col()

Let's recreate the plot below using geom_col().

eng_qscores <- qscores %>%
  filter(department == "ENGLISH")

ggplot(data = eng_qscores, aes(x = number, y = hours)) +
  geom_col() +
  theme(axis.text.x = element_text(size = 5)) +
  labs(title = "Hours of Work in Harvard English Classes")

Exercise 1

Use the data set eng_qscores which was made from the data set qscores and was filtered to only include data where department == ENGLISH.

eng_qscores <- qscores %>%
  filter(department == "ENGLISH")

ggplot(data = eng_qscores, aes(x = number, y = hours)) +
  geom_col() +
  labs(title = "Hours of Work in Harvard English Classes")
eng_qscores <- qscores %>%
  filter(department == "ENGLISH")
eng_qscores <- qscores %>%
  filter(department == "ENGLISH")

ggplot(data = eng_qscores, aes(x = number, y = hours)) +
  geom_col() + labs(title = ...)

Exercise 2

Let's make a plot recording the fuel efficiency of different Toyota models with the toyota_mpg tibble (made from mpg, and geom_col(). Set x = hwy, and y = cty.

toyota_mpg <- mpg %>% 
  filter(manufacturer == "toyota")

ggplot(data = toyota_mpg, ...)
toyota_mpg <- mpg %>% 
  filter(manufacturer == "toyota")

Exercise 3

Nicely done. Because geom_col() uses the stacked position by default, it's lumping all the different car models with the same name on the same bar. Let's set our position to dodge, and continue. Let's also set color to "white" to make a white outline around our bars.

toyota_mpg <- mpg %>% 
  filter(manufacturer == "toyota")

Advanced Plotting - geom_smooth()

Let's create this graph using geom_smooth().

ggplot(data = trains, mapping = aes(x =att_start, y = att_end, color= treatment)) +
  geom_smooth(method = "gam", alpha= .75)

Exercise 1

The graph above uses the data set trains, which includes data on an experiment about attitudes towards immigration. Using geom_smooth() and setting method to"gam". Put att_start on the x axis and att_end on the y axis.


ggplot(data = ..., mapping = aes(x = ..., y = ...)) +
  geom_smooth(method = "...")

Exercise 2

Nice! Now set the color aesthetic to the treatment variable and set alpha to .75.


Because color is an aesthetic, set it inside of aes(). Set alpha inside of geom_smooth().

Exercise 3

Now let's use the iris tibble, which records the dimensions of samples of iris flowers, and geom_smooth() to make a plot with x = Petal.Length, and y = Petal.Width.


Exercise 4

Nice. Now, let's map color to Species and set method to 'lm'


Exercise 5

Good work. Now, use what you've learned to recreate the graph below. You are going to need to use the argument 'se= FALSE' in the geom_smooth() function.

ggplot(data = iris, mapping = aes(x = Sepal.Width, y = Sepal.Length, linetype = Species)) +
    geom_smooth(se = FALSE, color = "purple") +
    labs(title = "Sepal Width and Length in Iris Species",
         x = "Sepal Width",
         y = "Sepal Height")

Advanced Plotting - geom_density()

Exercise 1

Let's make a plot using geom_density() and the qscores tibble. Also, map linetype to term within aes().


Exercise 2

Nice. Now, let's use labs() to label our x-axis "Rating", and our y-axis "Density of Classes".

Let's also add an xlim() function, to set the range of x values on our plot to (2.5, 5).


Exercise 3

Good stuff. Now, use what you've learned about geom_density() to recreate the plot below.

ggplot(data = qscores, aes(x = enrollment, color = term)) +
  geom_density() +
  xlim(0, 200) +
  labs(title = "Distribution of Students Enrolled in Harvard Classes",
       x = "Number of Students Enrolled",
       y = "Density of Classes")

Submit

Congrats on finishing your first Gov 50 tutorial! You're on your way to being a master in data visualization and wrangling! :)

submission_ui
submission_server()


davidkane9/PPBDS.data documentation built on Nov. 18, 2020, 1:17 p.m.