RStudio and GitHub Introduction

library(learnr)
library(tutorial.helpers)
library(tidyverse)
library(knitr)
library(usethis)
library(palmerpenguins)
library(gert)
library(gitcreds)

knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(out.width = '90%')
options(tutorial.exercise.timelimit = 60, 
        tutorial.storage = "local")


Introduction

This tutorial mostly covers material which is not actually in R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. But the material is in keeping with the spirit of that book, especially Chapter 28 Quarto. The material covered in Chapter 4 Workflow: code style appears in the "RStudio and Code" tutorial in this package.

Professionals store their work on Github, or a similar "source control" tool. If your computer blows up, you don't want to lose your work. Github is like Google Drive --- both live in the cloud --- but for your computational work rather than your documents.

This tutorial will introduce you to Github and to Git, a program for keeping track of changes in your code.

The most useful reference for Git/Github/RStudio is Happy Git and GitHub for the useR. Refer to that book whenever you have a problem.

Terminal Overview

The RStudio Terminal is how you directly communicate with the computer using commands, whereas the RStudio Console is how you talk to R.

This section will help introduce you to the Terminal so that you'll be up to speed for the commands needed later.

Exercise 1

Start by clicking on the Terminal tab in RStudio. This is next to the Console tab, within the Console pane.

You should see some text, which is a portion of the path to the current working directory, followed by a prompt such as a dollar sign. Copy (by highlighting, then right clicking and selecting the "copy" option) and paste the entire prompt into the box below.

We abbreviate the instructions copy/paste the command/response as CP/CR throughout this tutorial.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Your answer should look something like:

Davids-MBP:projects dkane$ 

The path begins with the hostname of my computer: Davids-MBP. After a :, we have the current working director, which happens to be named projects in this case. After a space, we have my username. The entire prompt ends with a dollar sign, which indicates that I am logged in as regular user.

On Windows, your answer would be more like:

C:\Users\Alan\Documents\R\Projects

Exercise 2

Type the command pwd in your Terminal and hit return. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Your answer should look like:

Davids-MacBook-Pro:projects dkane$ pwd
/Users/dkane/Desktop/projects
Davids-MacBook-Pro:projects dkane$ 

Your answer will be different, not least because your username is unlikely to be dkane. This is the location on your computer where the R program started up when you opened RStudio.

Going forward, make sure to pay special attention to whether given commands are supposed to be run in the Console or the Terminal.

Exercise 3

What are the names of the four default panes in RStudio?

question_text(NULL,
    message = "Come on! The answer is right in the link: Source, Console, Environment, and Output.",
    answer(NULL, correct = TRUE),
    allow_retry = FALSE,
    incorrect = NULL,
    rows = 6)

Recall:

knitr::include_graphics("images/rs_panes.jpg")

That's all you need to know about the Terminal for this tutorial. The r4ds.tutorials package includes another tutorial entirely dedicated to using the Terminal, which is also sometimes referred to as the "command line."

Setting up GitHub

GitHub is an online drive for all your R code and projects. In the professional world, what you have on your GitHub account is more important than what you have on your resume. It is a verifiable demonstration of your abilities.

Exercise 1

Install Git by following the instructions of the Install Git chapter in Happy Git and GitHub for the useR.

After you install Git, you should quit and then restart RStudio so that it has a chance to "recognize" that Git is installed. (Note that restarting RStudio is not the same as restarting your R session within your RStudio instance. To quit RStudio you need to go to the RStudio -> Quit RStudio menu item.)

After restarting RStudio, click on the Terminal tab. Run git --version in the Terminal to make sure that Git is installed and accessible. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Your answer should look like:

Davids-MacBook-Pro:r4ds.tutorials dkane$ git --version
git version 2.39.3 (Apple Git-145)
Davids-MacBook-Pro:r4ds.tutorials dkane$ 

The details will differ. Don't worry if you have a different version of Git installed.

Pro Git is the best reference book for Git and Github.

Exercise 2

The next step is to create a GitHub account by following the instructions at the GitHub homepage. Follow this advice when choosing your username. We recommend using a permanent email address for this account, not one which you lose access to when, for example, you change schools or jobs.

Copy your GitHub account URL in the field below.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Your answer should look like this:

https://github.com/your-username

Git is "software for tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development."

Exercise 3

Now that you have your GitHub account, you need to connect it to RStudio. The first step of doing this is to give RStudio your GitHub account user name and email.

In the Console, run this command, substituting the email and user name you used for your GitHub account:

usethis::use_git_config(user.name = "your-username", user.email = "your@email.org")

Note that this will not return anything. As long as you don't get an error message, it (probably) worked.

Run git config --global user.name in the Terminal to make sure your computer remembers your GitHub username. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

You answer should look like this:

Davids-MBP:project-2 dkane$ git config --global user.name
davidkane9
Davids-MBP:project-2 dkane$ 

Of course, it should be your username, not mine!

Exercise 4

Run git config --global user.email in the Terminal to make sure your GitHub email is stored too. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 3)

Your answer should look like:

Davids-MacBook-Pro:r4ds.tutorials dkane$ git config --global user.email
dave.kane@gmail.com
Davids-MacBook-Pro:r4ds.tutorials dkane$ 

Now, your GitHub username and email are stored so that your computer can automate a lot of the tedious steps required to communicate with GitHub.

Exercise 5

Note: Make sure you received the expected outputs of the name and email associated with your Github account in the previous two exercises.

If GitHub is your Google Drive, then GitHub repositories, or "repos," are the Google folders, the location in which we store our work. Let's make a practice repo!

To begin, sign into GitHub and go to the homepage: www.github.com. Click the green "New" button on the upper left side of the page.

include_graphics("images/github_new_repo.png")

Name your repository First_Repo. Then, select the "public" option for your repo and check the box saying "Add a README file". README is a document where programmers explain details of their project. When in doubt follow these instructions.

After you've done that, go ahead and click "Create repository" to create your first repo!

include_graphics("images/repo_detail.png")

GitHub should have directed you to a page called the project page after creating the repo. Copy and paste the URL of the project page below.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Your answer should look like:

github.com/davidkane9/First_Repo

The "public" option means that anyone will be able to view the repository. But only you can edit it. You might have noticed "Add .gitignore" as another option, but we will never check this box because RStudio will automatically create a .gitignore file for us when we connect the Github repo to an R project on our computer.

Note that things work slightly differently on Posit Cloud. See our discussion if you are working there.

Exercise 6

The next step will be to create the exact same project in your local computer through the process of cloning. By having a local and a GitHub version, you can edit your project on your computer and send all the changes you've made to GitHub. This process ensures that the GitHub version is synced up with your local version.

To clone the repo, click on the green button that says "Code". Then, copy the link shown. You can use the clipboard button on the right to automatically copy it.

include_graphics("images/repo_clone.png")

Paste the link below.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

It should look something like:

https://github.com/davidkane9/First_Repo.git

This link points to your project folder on GitHub. It is slightly different from the URL of the project page because the .git suffix tells you that it's a special GitHub object and not a webpage.

Exercise 7

To connect your Github repo to an R project, we first make a new project. While you are making a new project, you will not be able to access this tutorial.

Read all instructions before making your project.

This last step will cause RStudio to restart. Go ahead and follow these steps now. Then, return to the tutorial.

Exercise 8

Reminder: You just completed these steps:

Let's now confirm that your setup is correct. In the Terminal, run git remote -v. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Your answer should look something like:

Davids-MBP:First_Repo dkane$ git remote -v
origin  https://github.com/davidkane9/First_Repo.git (fetch)
origin  https://github.com/davidkane9/First_Repo.git (push)
Davids-MBP:First_Repo dkane$ 

Don't worry too much about the details. You've successfully linked a repo to an RStudio Project!

Exercise 9

You have linked your GitHub repo to your R project, but you haven't proven that you are someone with editing access to the project. If we just let anyone with the GitHub link edit our repo, that's obviously going to lead to problems. This is why we use something called a personal access token, or a PAT.

A PAT is just a special computer-generated password between your computer and GitHub that lets GitHub connect your GitHub account and your computer. If you want to learn more, see "Personal access token for HTTPS" from Happy Git and GitHub for the useR

Create a PAT by using the usethis::create_github_token() function in the Console. This should redirect you to a GitHub page about creating a PAT.

Set the "Note" field to "My First PAT" and keep the scopes the same as the default. It should look something like the picture below (Doesn't have to be exact).

include_graphics("images/pat.png")

Press "Generate Token" at the bottom of the page to finalize your PAT. If you have trouble, the function usethis::gh_token_help() may be helpful.

Exercise 10

Now that we've created our PAT, first temporarily copy it somewhere you have easy access to. We will need to store it somewhere so that RStudio can "see" it.

Run gitcreds::gitcreds_set() in the Console and follow the prompts, providing your token when asked.

Restart your R session with Session -> Restart R. (This is not the same thing as restarting RStudio.) You could also use Cmd/Ctrl + Shift + 0 to restart R. Using the shortcut key is always a good idea. (On Windows, this would be Ctrl + Shift + F10.)

Run usethis::git_sitrep() in the Console. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 12)

The key "Github user" section should look something like:

── GitHub user 
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': '<discovered>'
• GitHub user: 'davidkane9'

See "Managing Git(Hub) Credentials" for more details.

Updating .gitignore

The .gitignore file determines which files git should ignore. You should be able to see your .gitignore file in the Files tab in the Output pane.

(To see any files which are "hidden" --- those that begin with a period . --- you may need to click on the gear symbol in the Files tab (next to the "Rename" button) and then select "Show Hidden Files".)

Exercise 1

From the Console, run list.files(). CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 3)
list.files()

In the "RStudio and Code" tutorial, we saw that an R project begins with just one file, the Rproj file, which is First_Repo.Rproj in this case. But, because this R project started by connecting to a Github repo --- and because we initialized that repo with a README --- we also have a README.md file.

Exercise 2

From the Console, run list.files(all.files = TRUE). CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 3)
list.files(all.files = TRUE)

Because .gitignore is a "hidden" file, since its name begins with a ., we need to set all.files = TRUE in list.files() if we want it to be returned. Don't worry about hidden directories like .git and .Rproj.user. We won't be using them.

Exercise 3

The purpose of .gitignore is to list all the files that you don't want to be uploaded to GitHub. This is especially useful when you are working with big datasets or files with private information.

Open the .gitignore and add a new line with First_Repo.Rproj. This tells Git to ignore any file named First_Repo.Rproj. Save the file, but make sure that the last line in the file is blank.

In the Console, run:

tutorial.helpers::show_file(".gitignore")

CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 3)

This should output all of the lines in your .gitignore file, which should include First_Repo.Rproj.

We could also have used *.Rproj. The * tells your computer that we want to prevent all files ending in Rproj from being uploaded.

Exercise 4

Navigate to the Git tab in the Environment pane. You should see .gitignore listed but not First_Repo.Rproj. The Git tab shows all the changes you have made on your local computer. Since you added First_Repo.Rproj in .gitignore, First_Repo.Rproj should not show up as one of the changed files that the GitHub version needs to be synced with.

If you do see First_Repo.Rproj, try clicking on the refresh button --- a small clockwise swirl --- in the upper right. Or make sure you have it spelled correctly.

include_graphics("images/updated_gitignore.png")

Exercise 5

Now that we've updated our .gitignore file, we want to upload this new version to GitHub. Otherwise, GitHub doesn't know that we want to hide our First_Repo.Rproj file.

To do so, go to the Git tab in the Environment pane toward the top right of your screen. Click the check box next to the .gitignore file and then click on the "Commit" button.

include_graphics("images/commit.png")

This will open a new window where you will write a commit message. The message is meant to note what you're adding/changing within the repo. And yes, it's mandatory. Just a few (sensible) words are enough.

Exercise 6

In the Console, run:

gert::git_log() |> select(-commit, -merge, -files) |> dplyr::slice(1)

CP/CR.

This returns the author, time, and message of the last commit.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

If you get an error about a function like slice or select not being found, then load the dplyr package with library(dplyr) and try again.

Exercise 7

Next, press Push, i.e., the green arrow on the top right. This pushes or uploads the changes to GitHub. Open up GitHub and refresh the project page to see the commit message you just made at the top of your repo page.

Run gert::git_ahead_behind()$ahead in the Console. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

This should return 0, showing that you are 0 commits out of sync with the GitHub version.

Publishing to R Pubs

Let's publish a file from our project onto the internet using R Pubs.

Exercise 1

First let's make a new Quarto document. Select File -> New File -> Quarto Document .... Change the title to "Example". You are the author. Keep the other fields as the default values. Save the file with Cmd/Ctrl + S and name the file example. RStudio will automatically add the .qmd extension.

Run list.files(). CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

You should see example.qmd somewhere in the output. Recall that the title inside of the document ("Example") has no necessary connection to the name of the file (example.qmd). However, it is often the case that they will be similar, as here.

Exercise 2

Render this Quarto document by pressing the "Render" button or, better, with Cmd/Ctrl + Shift + K. Run list.files() in the Console. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

example.html should be one of the files outputted.

Exercise 3

Notice the blue button at the top of your QMD. Click that and select RPubs. (If you completed the "RStudio and Code" tutorial, you should already have an RPubs account.) If you do not have an account, you will need to create one. Always choose the free option.

Create a slug --- the file name for the last part of the URL --- like "my-project" and publish. Put the URL of your RPubs Page below:

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

If you did not select a slug, then RPubs provided a meaningless slug, most likely a string of numbers. That does not look cool! Much better to have a slug which describes the file.

Exercise 4

In the Git tab in the top right corner, check the boxes next to "example.qmd", and "example.html" to stage them. Do not check the box next to example_files. Press the commit button and add the commit message "published webpage".

Now, press the Push button with the green arrow. Your changes have now been saved to Github! No need to worry about example_files right now. Quarto always creates a directory named filename_files and uses it for the tools/residues of the rendering process. Nothing in such files is worth saving.

Run gert::git_ahead_behind()$ahead in the Console. This should return "0" just like before. CP/CR.

question_text(NULL,
    answer(NULL, correct = TRUE),
    allow_retry = TRUE,
    try_again_button = "Edit Answer",
    incorrect = NULL,
    rows = 2)

Nice work! You've created an RPubs page. All your work is backed up to GitHub.

We now understand the entire data science work cycle. Begin with Github repository, connect to an RStudio project, create a Quarto document (including a plot or other analysis), publish your results to RPubs, and, finally, commit/push all your work to Github. Let's practice these skills in some new projects.

Summary

This tutorial mostly covered material which is not actually in R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. But the material is in keeping with the spirit of that book, especially Chapter 28 Quarto. The material covered in Chapter 4 Workflow: code style appears in the RStudio and Code tutorial in this package.

You should now have Git working on your computer. You should have a Github account and know how to connect Github repositories to RStudio projects.

The most useful reference for Git/Github/RStudio is Happy Git and GitHub for the useR. Bookmark it!




Try the r4ds.tutorials package in your browser

Any scripts or data that you put into this service are public.

r4ds.tutorials documentation built on April 3, 2025, 5:50 p.m.