library(tidyverse)
This is the code chunk that imports data. We will discuss how this code works in the next few sessions. All you need to know for now is that this imports an R object
into your environment. Look to the right at the "Environment" tab and see a new object called heart
.
heart <- readRDS(here::here("data", "heart.Rds"))
glimpse
into the kinds of data in the dataframeThe heart
object is a table (also called a dataframe), that has a series of columns and rows. The function glimpse()
perfroms a quick peek at the data. We can see that this table has 15 columns (called 'Variables'), and 1025 rows (called 'Observations'). The name of each Variable is listed below (e.g. 'id', etc.). Importantly, R cares about capitalization, so id
is NOT the same as ID
. The type of variable is listed next (e.g.
glimpse(heart)
Another way to View the heart
table object is to run the view function. This is one of the odd functions in R that starts with a capital letter. Viewing the table will open a new tab in Rstudio to see all the data. This works great for small tables (like this one), but will crash R with big tables (100,000 x 100,000). To view this table, type in the console below: View(heart)
Understanding where your data come from, and how your data are structured is one of the most important things in Data Science. This table was downloaded from: https://www.kaggle.com/johnsmith88/heart-disease-dataset/version/2. Go to a web browser to look at the original source.
Now we will make your first graph. Run the code chunk below to generate the image, and then continue reading about what each part of the code does.
ggplot
is the function
geom_point
is the type of graph - scatterplot with points, in this case
labs
is the layer for labels
theme_minimal
is an easy layer to change several of the visual features of the graph
NULL
doesn't add anything to the graph, but allows you to comment out (#) layers without breaking the graph; if you break the graph and R is not responding, it's likely that R
is waiting for another command; just hit esc
on your keyboard to start over.
ggplot(heart, aes(x = sex, y = chol)) + geom_point(color = "darkblue", size = 3) + labs(x = "sex", y = "Cholesterol", title = "Cholesterol values from the heart disease dataset", caption = "Data from Kaggle | Plot from @matthewhirschey") + theme_minimal() + NULL
#Q: Which line is responsible for plotting the data on the graph? #A:
{r first_graph}
code chunk. Copying and pasting is OK! Look at the different variables that are available for plotting by inspecting the {r glimpse}
code chunk. The data dictionary for these variables is located at: https://www.kaggle.com/johnsmith88/heart-disease-dataset/version/2. Feel free to experiment and plot different information.
change the variable, if you'd like; the original plot compared heart id # vs. Basepairs, but 12 other variables are unused.
change the color of the points (be resourceful to find color names useable in R
)
change the size of the points to something bigger
finalize the labels (title, axes, caption) based on what you changed, and to reflect that you made it!
* get creative; use cheatsheets or other resources to change other elements of this graph
#your code goes here
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.