knitr::opts_chunk$set( collapse = TRUE, comment = "#>")
library(Stat231) library(openintro) library(tidyverse)
We are going to use the gpa_study_hours
data set from the openintro
package. Before we view the data set lets open the help file in the Directory pane. Click the 'Help' tab and search for 'gpa_study_hours'. Let's view the data set. You can choose to either run the code chunk below or type the command into your Console.
View(gpa_study_hours)
Below is a code and output for the calculating the average gpa, first in base R followed by Tidyverse below. There are two methods for doing so with base R: with and without a pipe operator. We'll refer to the base R pipe as the native pipe.
# Base R syntax mean(gpa_study_hours$gpa) # Native pipe syntax gpa_study_hours$gpa |> mean()
mean() is a function in R that will calculate the average of a numeric variable. The syntax for the argument is dataset$variable. Using the native pipe, we simply need to id the function.
The Tidyverse pipe-operator, %>%
, comes from the magrittr
package. It is a key part of Tidyverse syntax. It tells us to take whatever is on the left side of the argument "and then" perform the next line of code. The differnce between this pipe and the native pipe is that the native pipe can only pipe data into the first argument of a function. We'll cover this difference more in depth in a later lab.
# Tidyverse Syntax gpa_study_hours %>% summarize(mean(gpa))
This above chunk of code tells us to take the gpa_study_hours data set "and then" summarize the data by calculating the mean of the gpa variable.
For the remainder of our labs we will primarily use Tidyverse syntax as it is a bit more fluid.
Here we will sketch a histogram of the gpa of these students.
ggplot(data = gpa_study_hours, aes(x = gpa))+ geom_histogram()
ggplot
is a function from the ggplot2
package which is a part of the Tidyverse. It stands for Grammar of Graphics Plot. It contains a vast array of tools that can be used for data visualization. The geom_*
is a Geometry that we want to graph. In this case a histogram. We'll delve deeper into these in lab #3. Notice how its hard to tell where some of the bars should be split as well as what the class width is (see the note that popped up above the chart). We can define those explicitly in our geom.
ggplot(data = gpa_study_hours, aes(x = gpa))+ geom_histogram(color = "White", binwidth = 0.25)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.