les <- 1 knitr::opts_chunk$set(echo = TRUE, class.source="Rchunk", class.output="Rout")
In this lesson, you will learn how to keep track of your file changes using Git. After this lesson:
Before you continue with this lesson, make sure you have done the following:
A reminder of the exercise and assignment directives.
Within the lesson you will encounter a number of exercises.
Please answer all the assignments and exercises in an RMarkdown document. You learned how to make one in DAUR1. Save it somewhere in your course project.
Most lessons have portfolio assignments! Save those for now as separate .Rmd files in a portfolio-project.
To explain why Git and GitHub are so important, let us look at an example. Suppose you are analyzing some data for your internship using R. You have written an R script to perform the data analysis and the script is working. However, one day you decide to make some changes to the script in order to make the computations a bit faster. You work the whole day on the code and at the end of the day you try if the code is working as expected. Unfortunately, it is not working as expected. But the code was working before. It would be great if you could go back to the previous version of the code. Or if you could easily see the changes you made, to find out why your script stopped working.
The example shows that it is important to keep track of your changes. Of course you could have made your changes in a copy of the original script. But in this way, you would end up with a lot of files on your computer. In addition, it is not easy to see what you changed. This is why we use version control software: software that keeps track of file changes and allows you to inspect the changes and to go back to previous versions of the files.
Git is the most popular version control software used today. It is designed to track changes of text files (files you can open in a simple text editor, such as R scripts and markdown files). Git can be easily used to track the changes of files that are on your computer.
However, you often want to work together with others (for example, fellow students or colleagues) on the same scripts. GitHub is a web-based platform that was created to promote collaboration. GitHub is based on the Git software. The platform allows you to create an online copy of your files and the Git version history. This copy can then be shared with others, who can then also contribute to the code themselves. You can use the online copy on GitHub also for yourself, for example as a backup of your code in case your computer crashes or you accidentally remove some folders on your computer.
Some people are way better at explaining stuff than at visual animations. Watch the first 5 1/2 minute of this video
You are hopefully convinced that Git and GitHub are the way to go. But how do you start? How can you use Git and GitHub to keep track of the file changes in your scripts? In this section, you will learn the different steps of the Git workflow. We will focus on the concepts. In part two of this lesson, we will give you a detailed protocol for how to do things by yourself.
Before you can use Git and GitHub to track your file changes, you need to set up your own Git repository. A Git repository is a folder in which the version control takes place. That means that all the changes of files that are located in that folder (and its subfolders) are tracked. It is best practice to initiate the Git repository on GitHub and then download it to your computer, as will be explained in part two of this lesson. This results in a folder on your computer. All files that you place in this folder will be under version control by Git.
Just as you can have multiple folders on your computer, you can also have multiple Git repositories on your computer (and on GitHub). It is recommended to create a separate Git repository for each project you are doing. For example, if you create scripts and markdown files for your internship, you create a Git repository that only contains the work of this internship. If you performed several different research projects in your internship, you can create separate Git repositories for each research project.
Once you have placed files in the Git repository folder, you can start to track the file changes. It is important to note that Git does not automatically track the file changes. If you do not tell Git to save the version history, it will not do anything! Hence, you need to actively tell to Git to save the file changes. The action of saving a new version of your files in the Git repository's file history is called a commit. It is recommended to regularly commit your changes, preferably several times a day.
When you commit your changes, the Git repository on your computer will be updated: a new version of your files will be added to the version history of the Git repository on your computer. However, the online copy of your Git repository on GitHub does not get updated, unless you tell Git to do so. The process of updating your Git repository on GitHub based on local changes (new versions on your computer) is called pushing (you 'push' the changes from your computer to GitHub). You can push your changes each time you commit your changes, but it is also possible to push all the commits at once at the end of the day. However, it is recommended to push your changes at least once per day, to keep the copy of your Git repository on GitHub up to date.
If you work together with other persons, it may be the case that the Git repository on GitHub has been changed by your collaborators/colleagues. These commits are then present in the Git repository on GitHub, but not in your local Git repository. In other words, the Git repository on your computer is lagging behind the Git repository on GitHub. In this case, you want to update your Git repository on your computer based on the changes on GitHub. This update process is the reverse of pushing, hence it is called pulling (you 'pull' the changes from GitHub to your computer).
Pulling is only important if other persons contributed to the repository. In part two of this lesson, we will focus on the simple situation were you work alone on your files and you are the only person contributing to the Git repository on GitHub. In part three, we will explore the best practices for working together on the same files.
In this section, we will give you a detailed protocol for working with Git using RStudio. This protocol assumes that you are the only person that contributes to the Git repository. If you work together with other persons, you will need additional tools than the ones presented here. You will learn more about these tools in the next lessons.
Before you can track your file changes using Git, you first need to set up your Git repository. It is best practice to create your Git repository on GitHub and then download a copy of the Git repository to your computer. We explain these two steps in detail below.
To set up your Git repository, you go to the GitHub login page and log in with your credentials. You are then taken to the home page. On the left side, you can find a panel with your repositories (if you have any). Here, you can click on 'New' to create a new repository (see figure below, top panel).
Once you click on 'New', you will be taken to a menu were you can specify the details of your new Git repository. The menu consists of the following parts (see also figure below, bottom panel):
knitr::include_graphics('images/lesson22_1_createRepo.png')
Let's try the git workflow on github first.
Follow the tutorial here starting from "commit your first change".
Once you have created a Git repository on GitHub, you can download the repository to your computer. Downloading a Git repository is called cloning. The steps for cloning a Git repository are as follows:
RStudio will now create a new R Project and clone the Git repository from GitHub.
As you learned in the DAUR1 course, it is best practice to use R Projects if you are working with RStudio. Using the above steps, you can easily couple the R Project to the Git repository. All files in the R Project folder will now be tracked for version control. These files can be R scripts, (R)markdown files, but also (among others) bash scripts.
knitr::include_graphics('images/lesson22_2_downloadRepo.png')
OK, you are ready to go! You have created your first Git repository and you can now start to add files and track their changes. In this part, we will show you how to commit your changes and how to push your changes to the Git repository on GitHub.
Github uses personal access tokens. You will have to generate one on github, and tell Rstudio / git on your computer what is is.
Copy the following to the console:
install.packages("gitcreds") library(gitcreds) gitcreds_set()
Rstudio asks for a github token. Let it wait for a bit. Go to your browser. Generate your token by following these steps.
Copy your token (you only get to see it once!) and paste it in Rstudio (who was still waiting for your reply).
You will probably have to give Rstudio your github name and password later as well. We'll get to that.
We will start by creating a new R script and saving it in the directory of the Git repository. In the right upper corner of RStudio, you can find a submenu called 'Git'. If you open this menu, you can see the changes you have made to the repository (see figure below). These changes may be new files that you just added to the repository, but they also can be changes to the content of pre-existing files.
knitr::include_graphics('images/lesson22_3_gitmenu.PNG')
In this Git menu, you can perform all your Git actions, including committing your changes, pushing your changes to GitHub or pulling GitHub changes to your computer. We will now discuss these steps in detail.
You just created a new R script. In addition, by cloning your Git repository from GitHub, you created a new R Project file. These changes need to be stored in the Git repository. In other words, you need to commit the changes to the Git repository. You can do this by clicking on the 'Commit' button in the Git menu. A new window will then open. In this window, you can select the changes you want to commit to the Git repository (see figure below, left upper corner). Note that by selecting a file, you can see the changes in the bottom panel of the window (in which lines in green are new/added lines and lines in red are old/removed lines). After selecting the files you want to commit, you can write a commit message in the upper right panel. This commit message is necessary to later on easily distinguish the different commits in GitHub. It is therefore important to write commit messages that clearly indicate the changes you made in the commit. Finally, you can commit your changes by pressing the 'Commit' button. You can now close the commit window (also the other window that pops up after pressing the 'Commit' button).
knitr::include_graphics('images/lesson22_4_commitmenu.PNG')
After committing your changes, you have to update the GitHub repository. You can do this by pushing the recent commit(s) to GitHub using the 'Push' button (with the green, upward pointing arrow) in the Git menu.
The first time you push commits to GitHub from a repository in RStudio, you will be prompted for your user name and password. Likely, Rstudio will save your credentials, so that you will not have to enter them again. If it doesn't, type the following in the terminal:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
After pushing your changes, you can see the new/changed files in GitHub (you may need to refresh your page to see the changes). GitHub also offers a nice interface to see the exact file changes for each commit. In the main menu of your repository, you can click on the 'commits' link and subsequently select your most recent commit. You will then see all file changes that were committed during that commit (see figure below).
knitr::include_graphics('images/lesson22_5_commitsGitHub.png')
Some file changes do not have to be tracked by Git. These include file changes to files that are generated automatically by R (such as .RData and .Rhistory files) or other software. You can indicate to Git to ignore these files by creating a gitignore file in the Git repository. This is just a simple text file that is saved as '.gitignore' in the Git repository. In this file, you store the names of the files you want to ignore. You can also ignore entire directories or files with a certain file extension (see this page for the syntax).
As indicated above, you can easily create gitignore files on GitHub when you create your Git repository. On GitHub, you can select your programming language and then create a gitignore file that ignores system files of that programming language. You can always add other files and directories to the gitignore file yourself later.
In the text above, we explained how to use Git in combination with RStudio. However, you may also want to use Git with other programming languages, such as Python or Bash, outside RStudio. Of course this is possible. In fact, Git was designed to be used independently, as a simple command line tool. Once you install Git on your computer, you also install the Git Bash command line tool. You can use this tool in every directory of your computer to initiate Git repositories or to maintain them. Below, we will show you how you can do this and we learn you the most important commands.
Suppose you want to clone the Git repository outside RStudio in a directory on your computer. You can do this by browsing on your computer to the destination folder. In this folder, you can right click with your mouse and select 'Git Bash Here' (see figure below). A command line will open, which is basically a Linux command line.
knitr::include_graphics('images/lesson22_6_gitCommandLine.png')
You can now clone the Git repository from GitHub using the following command (of course replacing the link with the link of your own repository):
```{bash echo=TRUE, eval=FALSE} git clone https://github.com/basvangestel/test.git
A new folder will now be created. In this folder, the files of your Git repository are located. Now you can go to this folder and add/change files. With the following command you can see which files have been changed: ```{bash echo=TRUE, eval=FALSE} git status
You can commit the file changes with the following command:
```{bash echo=TRUE, eval=FALSE} git add --all; git commit -m 'your-message'
This command may look a bit difficult. The first part (`git add --all`) corresponds to selecting (all) the files in the RStudio commit window. Note that with this command you commit all file changes, unless they are ignored as indicated by the gitignore file. The second part (`git commit -m 'your-message'`) corresponds to writing your commit message and pressing the commit button in the RStudio commit window. Finally, you can push and pull changes using the following commands: ```{bash echo=TRUE, eval=FALSE} git push origin main git pull origin main
Click for the answer
b) Add to the gitignore file a line with data/**
. The files in the data directory are removed from the Git menu.
c) Change in the gitignore file the line data/**
for data/*.xls
. Now only the Excel files are ignored, but the csv files appear in the window again.
Click for the answer
If things get so messy that you are not able to upload a changed file, just download a new clone of your repo (file --> new project --> version control --> etc). Then copy your changed file into this new folder, commit and push.
You could also check if your mess up is listed here or if you prefer, the same site without swears here.
Maak opdracht 1a en 1b van de portfolio-opdrachten.
CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Unless it was borrowed (there will be a link), in which case, please use their license.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.