library(learnr) library(tutorial.helpers) library(tidyverse) library(knitr) library(usethis) library(palmerpenguins) library(gert) library(gitcreds) knitr::opts_chunk$set(echo = FALSE) knitr::opts_chunk$set(out.width = '90%') options(tutorial.exercise.timelimit = 60, tutorial.storage = "local")
This tutorial mostly covers material which is not actually in R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. But the material is in keeping with the spirit of that book, especially Chapter 28 Quarto. The material covered in Chapter 4 Workflow: code style appears in the "RStudio and Code" tutorial in this package.
Professionals store their work on Github, or a similar "source control" tool. If your computer blows up, you don't want to lose your work. Github is like Google Drive --- both live in the cloud --- but for your computational work rather than your documents.
This tutorial will introduce you to Github and to Git, a program for keeping track of changes in your code.
The most useful reference for Git/Github/RStudio is Happy Git and GitHub for the useR. Refer to that book whenever you have a problem.
The RStudio Terminal is how you directly communicate with the computer using commands, whereas the RStudio Console is how you talk to R.
This section will help introduce you to the Terminal so that you'll be up to speed for the commands needed later.
Start by clicking on the Terminal tab in RStudio. This is next to the Console tab, within the Console pane.
You should see some text, which is a portion of the path to the current working directory, followed by a prompt such as a dollar sign. Copy (by highlighting, then right clicking and selecting the "copy" option) and paste the entire prompt into the box below.
We abbreviate the instructions copy/paste the command/response as CP/CR throughout this tutorial.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Your answer should look something like:
Davids-MBP:projects dkane$
The path begins with the hostname of my computer: Davids-MBP
. After a :
, we have the current working director, which happens to be named projects
in this case. After a space, we have my username. The entire prompt ends with a dollar sign, which indicates that I am logged in as regular user.
On Windows, your answer would be more like:
C:\Users\Alan\Documents\R\Projects
Type the command pwd
in your Terminal and hit return. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Your answer should look like:
Davids-MacBook-Pro:projects dkane$ pwd /Users/dkane/Desktop/projects Davids-MacBook-Pro:projects dkane$
Your answer will be different, not least because your username is unlikely to be dkane
. This is the location on your computer where the R program started up when you opened RStudio.
Going forward, make sure to pay special attention to whether given commands are supposed to be run in the Console or the Terminal.
What are the names of the four default panes in RStudio?
question_text(NULL, message = "Come on! The answer is right in the link: Source, Console, Environment, and Output.", answer(NULL, correct = TRUE), allow_retry = FALSE, incorrect = NULL, rows = 6)
Recall:
knitr::include_graphics("images/rs_panes.jpg")
That's all you need to know about the Terminal for this tutorial. The r4ds.tutorials package includes another tutorial entirely dedicated to using the Terminal, which is also sometimes referred to as the "command line."
GitHub is an online drive for all your R code and projects. In the professional world, what you have on your GitHub account is more important than what you have on your resume. It is a verifiable demonstration of your abilities.
Install Git by following the instructions of the Install Git chapter in Happy Git and GitHub for the useR.
After you install Git, you should quit and then restart RStudio so that it has a chance to "recognize" that Git is installed. (Note that restarting RStudio is not the same as restarting your R session within your RStudio instance. To quit RStudio you need to go to the RStudio -> Quit RStudio
menu item.)
After restarting RStudio, click on the Terminal tab. Run git --version
in the Terminal to make sure that Git is installed and accessible. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Your answer should look like:
Davids-MacBook-Pro:r4ds.tutorials dkane$ git --version git version 2.39.3 (Apple Git-145) Davids-MacBook-Pro:r4ds.tutorials dkane$
The details will differ. Don't worry if you have a different version of Git installed.
Pro Git is the best reference book for Git and Github.
The next step is to create a GitHub account by following the instructions at the GitHub homepage. Follow this advice when choosing your username. We recommend using a permanent email address for this account, not one which you lose access to when, for example, you change schools or jobs.
Copy your GitHub account URL in the field below.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Your answer should look like this:
https://github.com/your-username
Git is "software for tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development."
Now that you have your GitHub account, you need to connect it to RStudio. The first step of doing this is to give RStudio your GitHub account user name and email.
In the Console, run this command, substituting the email and user name you used for your GitHub account:
usethis::use_git_config(user.name = "your-username", user.email = "your@email.org")
Note that this will not return anything. As long as you don't get an error message, it (probably) worked.
Run git config --global user.name
in the Terminal to make sure your computer remembers your GitHub username. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
You answer should look like this:
Davids-MBP:project-2 dkane$ git config --global user.name davidkane9 Davids-MBP:project-2 dkane$
Of course, it should be your username, not mine!
Run git config --global user.email
in the Terminal to make sure your GitHub email is stored too. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 3)
Your answer should look like:
Davids-MacBook-Pro:r4ds.tutorials dkane$ git config --global user.email dave.kane@gmail.com Davids-MacBook-Pro:r4ds.tutorials dkane$
Now, your GitHub username and email are stored so that your computer can automate a lot of the tedious steps required to communicate with GitHub.
Note: Make sure you received the expected outputs of the name and email associated with your Github account in the previous two exercises.
If GitHub is your Google Drive, then GitHub repositories, or "repos," are the Google folders, the location in which we store our work. Let's make a practice repo!
To begin, sign into GitHub and go to the homepage: www.github.com
. Click the green "New" button on the upper left side of the page.
include_graphics("images/github_new_repo.png")
Name your repository First_Repo
. Then, select the "public" option for your repo and check the box saying "Add a README file". README is a document where programmers explain details of their project. When in doubt follow these instructions.
After you've done that, go ahead and click "Create repository" to create your first repo!
include_graphics("images/repo_detail.png")
GitHub should have directed you to a page called the project page after creating the repo. Copy and paste the URL of the project page below.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Your answer should look like:
github.com/davidkane9/First_Repo
The "public" option means that anyone will be able to view the repository. But only you can edit it. You might have noticed "Add .gitignore" as another option, but we will never check this box because RStudio will automatically create a .gitignore
file for us when we connect the Github repo to an R project on our computer.
Note that things work slightly differently on Posit Cloud. See our discussion if you are working there.
The next step will be to create the exact same project in your local computer through the process of cloning. By having a local and a GitHub version, you can edit your project on your computer and send all the changes you've made to GitHub. This process ensures that the GitHub version is synced up with your local version.
To clone the repo, click on the green button that says "Code". Then, copy the link shown. You can use the clipboard button on the right to automatically copy it.
include_graphics("images/repo_clone.png")
Paste the link below.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
It should look something like:
https://github.com/davidkane9/First_Repo.git
This link points to your project folder on GitHub. It is slightly different from the URL of the project page because the .git
suffix tells you that it's a special GitHub object and not a webpage.
To connect your Github repo to an R project, we first make a new project. While you are making a new project, you will not be able to access this tutorial.
Read all instructions before making your project.
Click the File
dropdown menu in the top left of the R Studio window.
Click New Project ...
.
Instead of clicking New Directory
, as we did previously, click on Version Control
.
Click on Git
.
Paste the link you copied from GitHub into Repository URL
. Make sure your project is in the projects
directory you made in the "RStudio and Code" tutorial in order to maintain a single location in which you have placed all your R projects. If it is not, click on "Browse...", and select your projects
folder.
Hit the tab
key. This will copy the name of the repo into the box for the "Project directory name." It is a good idea for your repos and project names to be identical.
Click "Create Project."
This last step will cause RStudio to restart. Go ahead and follow these steps now. Then, return to the tutorial.
Reminder: You just completed these steps:
File -> New Project ... -> Version Control -> Git
Paste repo URL. Hit tab
key.
Click "Create Project."
Let's now confirm that your setup is correct. In the Terminal, run git remote -v
. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Your answer should look something like:
Davids-MBP:First_Repo dkane$ git remote -v origin https://github.com/davidkane9/First_Repo.git (fetch) origin https://github.com/davidkane9/First_Repo.git (push) Davids-MBP:First_Repo dkane$
Don't worry too much about the details. You've successfully linked a repo to an RStudio Project!
You have linked your GitHub repo to your R project, but you haven't proven that you are someone with editing access to the project. If we just let anyone with the GitHub link edit our repo, that's obviously going to lead to problems. This is why we use something called a personal access token, or a PAT.
A PAT is just a special computer-generated password between your computer and GitHub that lets GitHub connect your GitHub account and your computer. If you want to learn more, see "Personal access token for HTTPS" from Happy Git and GitHub for the useR
Create a PAT by using the usethis::create_github_token()
function in the Console. This should redirect you to a GitHub page about creating a PAT.
Set the "Note" field to "My First PAT" and keep the scopes the same as the default. It should look something like the picture below (Doesn't have to be exact).
include_graphics("images/pat.png")
Press "Generate Token" at the bottom of the page to finalize your PAT. If you have trouble, the function usethis::gh_token_help()
may be helpful.
Now that we've created our PAT, first temporarily copy it somewhere you have easy access to. We will need to store it somewhere so that RStudio can "see" it.
Run gitcreds::gitcreds_set()
in the Console and follow the prompts, providing your token when asked.
Restart your R session with Session -> Restart R
. (This is not the same thing as restarting RStudio.) You could also use Cmd/Ctrl + Shift + 0
to restart R. Using the shortcut key is always a good idea. (On Windows, this would be Ctrl + Shift + F10
.)
Run usethis::git_sitrep()
in the Console. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 12)
The key "Github user" section should look something like:
── GitHub user • Default GitHub host: 'https://github.com' • Personal access token for 'https://github.com': '<discovered>' • GitHub user: 'davidkane9'
See "Managing Git(Hub) Credentials" for more details.
The .gitignore
file determines which files git should ignore. You should be able to see your .gitignore
file in the Files tab in the Output pane.
(To see any files which are "hidden" --- those that begin with a period .
--- you may need to click on the gear symbol in the Files tab (next to the "Rename" button) and then select "Show Hidden Files".)
From the Console, run list.files()
. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 3)
list.files()
In the "RStudio and Code" tutorial, we saw that an R project begins with just one file, the Rproj
file, which is First_Repo.Rproj
in this case. But, because this R project started by connecting to a Github repo --- and because we initialized that repo with a README --- we also have a README.md
file.
From the Console, run list.files(all.files = TRUE)
. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 3)
list.files(all.files = TRUE)
Because .gitignore
is a "hidden" file, since its name begins with a .
, we need to set all.files = TRUE
in list.files()
if we want it to be returned. Don't worry about hidden directories like .git
and .Rproj.user
. We won't be using them.
The purpose of .gitignore
is to list all the files that you don't want to be uploaded to GitHub. This is especially useful when you are working with big datasets or files with private information.
Open the .gitignore
and add a new line with First_Repo.Rproj
. This tells Git to ignore any file named First_Repo.Rproj
. Save the file, but make sure that the last line in the file is blank.
In the Console, run:
tutorial.helpers::show_file(".gitignore")
CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 3)
This should output all of the lines in your .gitignore
file, which should include First_Repo.Rproj
.
We could also have used *.Rproj
. The * tells your computer that we want to prevent all files ending in Rproj
from being uploaded.
Navigate to the Git tab in the Environment pane. You should see .gitignore
listed but not First_Repo.Rproj
. The Git tab shows all the changes you have made on your local computer. Since you added First_Repo.Rproj
in .gitignore
, First_Repo.Rproj
should not show up as one of the changed files that the GitHub version needs to be synced with.
If you do see First_Repo.Rproj
, try clicking on the refresh button --- a small clockwise swirl --- in the upper right. Or make sure you have it spelled correctly.
include_graphics("images/updated_gitignore.png")
Now that we've updated our .gitignore
file, we want to upload this new version to GitHub. Otherwise, GitHub doesn't know that we want to hide our First_Repo.Rproj
file.
To do so, go to the Git tab in the Environment pane toward the top right of your screen. Click the check box next to the .gitignore
file and then click on the "Commit" button.
include_graphics("images/commit.png")
This will open a new window where you will write a commit message. The message is meant to note what you're adding/changing within the repo. And yes, it's mandatory. Just a few (sensible) words are enough.
In the Console, run:
gert::git_log() |> select(-commit, -merge, -files) |> dplyr::slice(1)
CP/CR.
This returns the author, time, and message of the last commit.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
If you get an error about a function like slice
or select
not being found, then load the dplyr package with library(dplyr)
and try again.
Next, press Push
, i.e., the green arrow on the top right. This pushes or uploads the changes to GitHub. Open up GitHub and refresh the project page to see the commit message you just made at the top of your repo page.
Run gert::git_ahead_behind()$ahead
in the Console. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
This should return 0
, showing that you are 0 commits out of sync with the GitHub version.
Let's publish a file from our project onto the internet using R Pubs.
First let's make a new Quarto document. Select File -> New File -> Quarto Document ...
. Change the title to "Example". You are the author. Keep the other fields as the default values. Save the file with Cmd/Ctrl + S
and name the file example
. RStudio will automatically add the .qmd
extension.
Run list.files()
. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
You should see example.qmd
somewhere in the output. Recall that the title inside of the document ("Example") has no necessary connection to the name of the file (example.qmd
). However, it is often the case that they will be similar, as here.
Render this Quarto document by pressing the "Render" button or, better, with Cmd/Ctrl + Shift + K
. Run list.files()
in the Console. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
example.html
should be one of the files outputted.
Notice the blue button at the top of your QMD. Click that and select RPubs. (If you completed the "RStudio and Code" tutorial, you should already have an RPubs account.) If you do not have an account, you will need to create one. Always choose the free option.
Create a slug --- the file name for the last part of the URL --- like "my-project" and publish. Put the URL of your RPubs Page below:
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
If you did not select a slug, then RPubs provided a meaningless slug, most likely a string of numbers. That does not look cool! Much better to have a slug which describes the file.
In the Git tab in the top right corner, check the boxes next to "example.qmd", and "example.html" to stage them. Do not check the box next to example_files
. Press the commit button and add the commit message "published webpage".
Now, press the Push
button with the green arrow. Your changes have now been saved to Github! No need to worry about example_files
right now. Quarto always creates a directory named filename_files
and uses it for the tools/residues of the rendering process. Nothing in such files is worth saving.
Run gert::git_ahead_behind()$ahead
in the Console. This should return "0" just like before. CP/CR.
question_text(NULL, answer(NULL, correct = TRUE), allow_retry = TRUE, try_again_button = "Edit Answer", incorrect = NULL, rows = 2)
Nice work! You've created an RPubs page. All your work is backed up to GitHub.
We now understand the entire data science work cycle. Begin with Github repository, connect to an RStudio project, create a Quarto document (including a plot or other analysis), publish your results to RPubs, and, finally, commit/push all your work to Github. Let's practice these skills in some new projects.
This tutorial mostly covered material which is not actually in R for Data Science (2e) by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund. But the material is in keeping with the spirit of that book, especially Chapter 28 Quarto. The material covered in Chapter 4 Workflow: code style appears in the RStudio and Code tutorial in this package.
You should now have Git working on your computer. You should have a Github account and know how to connect Github repositories to RStudio projects.
The most useful reference for Git/Github/RStudio is Happy Git and GitHub for the useR. Bookmark it!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.