knitr::opts_chunk$set(echo = FALSE, class.source="Rchunk", class.output="Rout") les <- 2
In the previous lesson you learned how to use Git if you are the only person working on a project. However, often you are working together with other persons. Git offers some additional tools to promote collaboration. We will introduce these additional tools in this part.
before you do anything, make a local backup of your github repositories now Most likely, you will have the local clone(s) on your computer. Just backup this folder. Github collaborating can be tricky and you will be messing around in github today. While you can most of the time recover previous work, you will want to have the peace of mind that you won't lose anything while learning. This gives you the opportunity to really try and see what happens if you mess things up while working with 2 people on the same repo, and see if you can fix it. We mean it, try to break stuff! We are here now to help you fix it, and you have the time now to learn. You don't want to have your first github mess-up when working on a deadline in your internship...
Remember this if you really mess up or git is telling you something like "you need to do A before you can do B, and remember you need to do B before you can do A".
In this lesson you will get to know many useful tools for collaboration in Github. You are required to apply them in your Data Science Projecticum
as well as in your workflows portfolio
.
After this lesson you can do the following tasks with github and Rstudio:
Branching and merging
Issues
fixes #<issue_number>
]Kanban & automation
Pull requests
main
Before we start branching, merging and collaborating, let's create and solve a simple conflict in case you haven't gotten one yet.
(1)
First, if you have no practice github repository yet do the following: - If you haven't got one yet: Create a private github repository for the normal (=not portfolio) exercises in the lessons. - Create a new version controlled R project linked to this github, and copy your exercise results from the previous lessons to this folder. - commit and push the changes.
(2)
Everyone now has a repository to save your regular (not portfolio) exercises, and practise with github. Now let's create a conflict:
(3)
Go to github online and open README.md. Make a change on the exact same line you just edited locally, and commit this change online.
(4)
Back in Rstudio, push your local changes. Git will give you a warning and will not change the file!
(5)
Click Pull to download the changes from Github
Git will tell you there is a conflict and that you will have to resolve it first. It will open README.md (if not, open it yourself.) It looks somewhat like this:
# your repo name <<<<<<< HEAD aThis is the bookdown course etc... # the local change ======= bThis is the bookdown course etc... # the online change >>>>>>> 884f9b4fb3730bbefaa12b079b3cb132d767e78a
Choose which lines to keep and delete all the rest, including the <<<< and >>>> lines. So you keep for instance:
# your repo name aThis is the bookdown course etc... # the local change
Save README.md, commit and push. Now it will update!
(If it won't and give you an error about unfinished commits, try adding another line in the README.md locally, commit and push.)
Suppose Bas and Marc are working together on a (data analysis) project. For this project, Bas created one shared Git repository on GitHub in which Bas and Marc can store the scripts for the project. Bas and Marc both cloned this repository to their laptops. Every now and then Bas and Marc make changes to the scripts. You can imagine that this may go wrong: Bas and Marc are both adding/deleting things from the Git repository at the same time from different computers and this can result in conflicts. For example, Bas may decide to add an extra line to script1 and commit/push the changes to GitHub. Marc, who still works on the old version, then removes some lines from script1. When Marc now commits/pushes the changes to GitHub an error will occur, because Marc's changes are not in line with Bas' changes.
Clearly, if you are working together with other persons, the workflow presented previously is too simple and can give rise to conflicts. Fortunately, Git offers additional tools for working together. The most important tool is a branch. A branch can be seen as a development environment in Git in which you store your commits. When you create a Git repository in GitHub, you automatically create one branch, the main branch. In the default state, you store all commits in this main branch. If you are working with multiple persons on the same project, it is best practice to have a development branch for each person. You can create such a development branch anytime during the project. When you create a development branch, you basically create a copy of the main branch (you 'branch off') in which you can work without affecting the content of the main branch. In this development branch you can commit/push as you normally would do in the main branch. After you made your changes and you know that the scripts are working as expected, you can add your changes to the main branch by merging your development branch with the main branch. You can most easily perform this merge on GitHub (where it is called a pull request), because of GitHub's user-friendly interface. After that, you can pull the updates in the main branch to your computer.
In summary, the workflow for projects with multiple persons consists of the following steps:
Below, we will explain these steps in detail and we will show how you can perform the steps in RStudio and GitHub.
IMPORTANT: you need to actually try all these things below. Not just scan our examples and believe us that this is how it works...
If you are committing your changes to the Git repository, the default is to commit the changes to the main branch. If you are in RStudio, you can easily see on which branch you are working by checking the Git menu:
knitr::include_graphics('images/lesson23_1_mainbranch.png')
You can create a development branch by clicking on the purple 'New Branch' symbol in the Git menu:
knitr::include_graphics('images/lesson23_2_newbranch.png')
Make sure that you pull once before creating your development branch, to make sure that your main branch is up to date. You can pull using the blue, downward pointing pull arrow in the Git menu. Once you click on the 'New Branch' symbol, a new menu will open. In this menu, you can choose the name for your development branch. For the other options, you can accept the default settings (see figure below).
knitr::include_graphics('images/lesson23_3_newbranch2.png')
After creating the development branch, you can see that you automatically switched from the main branch to the newly created development branch.
Now that you are on the new development branch, you can start changing files. Change some files in any way you like. After you made your changes, you can commit your changes and push the changes to GitHub as explained in the previous section. You can also see the changes you made in GitHub by selecting the correct branch. You can do this in the 'code' menu of your repository:
knitr::include_graphics('images/lesson23_4_branchesGitHub.png')
Once you finished making your changes, you can add the changes to the main branch. The easiest way to do this is to create a pull request on GitHub. On the GitHub website go to the pull request menu (again: do all this. don't just read about it):
knitr::include_graphics('images/lesson23_5_pullrequest.png')
Usually, you will see a message in this menu that states that your development branch had some recent pushes. You can then click on the 'Compare and pull request' button. If this message is not visible, you can click on the 'New pull request' button. In both cases, you will go to a new page that looks like this:
knitr::include_graphics('images/lesson23_6_pullrequest2.png')
Here, you have to check if the branches that are going to be merged, are correct. In this case it is correct: the development branch 'Bas3' will be merged into the main branch. GitHub also checks for you if there will be any conflicts when merging. If everyone in your team follows the workflow as indicated in this lesson, there will rarely be conflicts. If everything is OK, you can create the pull request by clicking on the 'Create pull request' button.
After you created the pull request, you will be automatically guided to the page where you can resolve the merge request:
knitr::include_graphics('images/lesson23_7_pullrequest3.png')
Click on 'Merge pull request' and subsequently on 'Confirm merge'. If the merge was successful, you will see the following:
knitr::include_graphics('images/lesson23_8_pullrequest4.png')
You can now delete the development branch from GitHub by clicking 'Delete branch'.
Your Git repository on GitHub is now up to date again, but the main branch of your local Git repository (on your laptop) still needs to be updated. For this, you first switch to the main branch using the Git menu in RStudio:
knitr::include_graphics('images/lesson23_1_mainbranch.png')
Then you pull the changes from GitHub to your local Git repository using the blue, downward pointing arrow in the GitHub menu.
You have updated the Git repository on GitHub and on your laptop. You have also removed the development branch from GitHub. However, the branch is still present on your laptop. You can remove the branch by opening the Terminal in RStudio and typing the command git branch -d Bas3
(of course you have to use the name of your development branch instead of 'Bas3'). Try this.
The circle is closed: you have made some changes to the project using a development branch and merged this branch into the main branch. If you want to make some new changes, you can create a new branch and follow the same steps for these new changes.
Click for the answer
c) The changes disappear, because the changes are only present in the development branch and not in the main branch. If you go back to your development branch, the content reappears.
d) After resolving the pull request on GitHub and pulling the changes to your computer, you should be able to see the file changes in the main branch.
e) You can delete the branch on your computer using the command git branch -d <branch-name>
.
Today there will be an online demo. In this demo we will focus on the good practices when working in an Agile/SCRUM team and learn how to apply the basics in your own project work. If you have reached this point in the lesson and have not seen the demo yet, let the teachers know! We can do the demo a little earlier (or later) than we planned. In the meanwhile, you can just continue working, you don't yet really need the demo. You may see the terms "agile" or "scrum" sometimes though.
The workflow proposed below is documented here:
You can have several projects running within one repository. For the projecticum, you will probably have just one project. We will try to be consist in calling github projects github projects, as you should also be working in a project in Rstudio (if not, do so now!).
Github issues are the core for collaboration with people within and outside your code project. Issues can be used for external people to report a bug, a feature request or a question regarding functionality or implementation. People using for example your package can ask questions or report problems in the form of an issue. Below, an example is shown for the {dplyr}
R-package, which has a huge user community.
knitr::include_graphics( here::here( "images", "dplyr_issues.png" ) )
Posting an issue when you encounter an error or bug helps developers create robuster code and is the core driving force for developing new features or resolving problems in open source software. Opening issues is an easy way to start contributing to the open source community.
We have shown you before how to make issues if you find any mistakes in the reader. This serves as our todo-list. Now we will use issues as a development-todo list for the projecticum.
When collaborating together on project repository, you can create issues to stay on track for the different actions that need to be undertaken. Issues can be related to adding functionality (code) to the project, but also to report a bug or to request the addition or improvement of documentation. You can create an issue by opening the Issues
tab and creating a new issue by clicking the green New issue
button. A form will open where you can add a title and further remarks to document your issue. The formatting of the Leave a comment
field can be entered using Markdown formatting. When creating an issue, it is important to assign (preferable one) person to the issue. You can also choose a label, and if needed you can customize the label categories. When a Kanban/Project is also linked to the organization or repo, you can also add the issue to a Kanban board. More about Kanban here \@ref(kanban-and-automation). When creating a new issue, you can also add a Milestone
. We will discuss this later \@ref(milestones)
fixes #<issue_number>
and new branch
] {-}In the exercise above we saw how to open, fix and automatically close an issue. But what defines an issue and how would you prevent working on the same piece of code when working together.
In the main resource used for this piece in the course, we can read this about branches (as was already explained previously):
Branching is a core concept in Git, and the entire GitHub flow is based upon it. There's only one rule: anything in the main branch is always deployable.
Because of this, it's extremely important that your new branch is created off of main when working on a feature or a fix. Your branch name should be descriptive (e.g., refactor-authentication, user-content-cache-key, make-retina-avatars), so that others can see what is being worked on.
When working together in git/github, this flow is extremely important to agree upon and than stick to it.
There will be an interactive assignment today
Adapted from Wikipedia - The free Ecyclopedia
In software development, agile (here written as Agile) practices approach discovering requirements and developing solutions through the collaborative effort of self-organizing and cross-functional teams. Their customer(s)/end user(s) play a pivotal role in the work process and are involved in reviews at regular intervals. It advocates adaptive planning, evolutionary development, early delivery, and continual improvement, and it encourages flexible responses to change.
The Agile way of working is also very suitable for Scientific projects that have elements of Data Science and Software development. Because it is short-cycle development and prototyping, it is very useful in projects where different tasks need to be integrated in a tool or workflow or analysis. During the Data Science Projecticum, we will require you (within you project team, to apply the Agile way of working).
In practical sense, scrum is the collection of meetings and the tools used to administer the workload. The central tool that is being used to monitor all activities is called the 'Project Backlog'. You can look at this 'Project Backlog as the collection of all the work that needs to be done to complete the full project. Because the Agile way of work does not work with long term planning, but works according short cycles of developing a product, a service or perform a study, or involve stakeholders, the 'Project Backlog' is where all the team members go regularly together to agree on what needs to be done next.
The short cycles of working on developing new features, running computations, performing experiments, interviewing customers and stake-holders and acquiring new tools are called 'Sprints'. In each sprint, which typically lasts 3 working weeks, a number of items from the Backlog are 'moved' to the so-called 'Sprint Backlog'. During regular weekly sessions discussion take place which items need to go from the 'Project Backlog' to the 'Sprint Backlog' in order to add value to the project. These sessions are called 'Sprint planning'. At the start of each sprint a single
We will have practiced Agile using an interactive assignment.
Image: Scrum primer.
We have discussed the Agile workflow in the demo.
Milestones in Gihub can be considered as Agile/SCRUM sprints. So each sprint is associated with one or multiple milestones that need to be resolved at the end of each sprint.
When using a github project, you can setup a project to track issues. Project boards are a github Kanban board, and will keep track of the issues, pull requests and notes, and will display them as cards in columns, similar to for instance trello or MSteams planners. But they are in fact the same issues as we have seen and used before, Issues on the Kanban are also resolved when you close an issue, either manually or through a pull request.
To create a new backlog item, just create a new issue.
Once a new issue has been created, assign it the right labels and/or assign it to a sprint (milestone).
Issues allow you to have a conversation about the item and even allow you to create task lists inside the issue using GitHub's markdown.
Add the following labels to your repository:
Priorities
priority
labels allow you to prioritize items in your backlog e.g.:
priority: lowest
priority: low
priority: medium
priority: high
priority: highest
Types
type
labels allow you to easily filter items (issues) in the dashboard e.g.:
type:bug
: bugtype:chore
: chore, maintenance worktype:feature
: new featuretype:infrastructure
: infrastructure relatedtype:performance
: performance relatedtype:refactor
: refactor (opruimen van code)type:test
: test relatedOther
You can define and assign custom labels that you need within your workflow or organization.
You can create a milestone for every sprint and add items (issues) from the backlog to a milestone.
This allows you to group items in sprints and track them by milestone in your issue dashboard.
The backlog then consists of all items (issues) that have no milestone
attached to it.
TIP: Use no:milestone
in the search field on your issue dashboard to find backlog items.
At the end of this course, your portfolio assignments will have to be bundled into one portfolio website, hosted by github.
The {bookdown}
R package contains functions to bundle a collection of RMarkdown files in a website or a book, or a similar collection of information. It is based on RMarkdown and it's outpyut options (html, pdf, etc), but adds the possibility to make multi-page html websites and books sections, as well as cross-referencing across different .Rmd files. This website is rendered with bookdown. See some examples here. Bookdown is used for any document that consists of multiple pages that are meant to be read in sequence, so books, readers, but also handouts, a thesis, graduation paper, a software manual, etc. And it can very well be used for multipage stuff that is not necessarily read sequentially, such as websites or a portfolio.
A bookdown book will consist of different chapters, each in their own .Rmd file (this is why you had to do this for your portfolio). The home page, or first page, is always index.Rmd, which contains the YAML header.
Chapters are rendered in alphabetical + numerical order of their .Rmd file name. So if you want 3 chapters, you will have files like ìndex.Rmd
, 01_introduction.Rmd
, 02_cv.Rmd
and 03_firstproject.Rmd
. Go ahead and check the files in the repo of this course if you like, and compare them to the reader online.
File names that start with an underscore are skipped (in case you want to include files in the repo but not in the book for some reason).
Each R Markdown file must start immediately with the chapter title using the first-level heading, e.g., # Chapter Title
. So chapters will look somewhat like this:
# About me Stuff ## Paragaph title More stuff ## Paragraph I don't want numbered {-} bla ### subparagraph be here more bla
The build book
button is calling a function: bookdown::render_book(). You need to check some boxes before this actually works. If you want to use the build book
option in the future:
site: bookdown::bookdown_site
But you know our opinion on having to click anything at all, so you can use the console instead, for instance:
bookdown::render_book(here::here("index.Rmd"), "bookdown::gitbook") bookdown::render_book("index.Rmd", "bookdown::pdf_book")
or simply try (really, try it. does it work?)
bookdown::render_book(".")
These options work if your chapter .Rmd files are in the project root directory (the one you see when you type here::here()
in the console.)
If you really want to keep your chapter files in a subdirectory, which we don't recommend, you can still render the book with:
xfun::in_dir('mapje/nogeenmapje', bookdown::render_book('index.Rmd'))
IMPORTANT NOTE ON CONFIDENTIALITY
When hosting you portfolio on a public web address, please make sure that there is no confidential information visable in your project. Data that falls under a non-disclosure agreement for example from your projecticum may not be shared. Also, be sure not to publish any personal secrets such as passwords and tokens, inside a github repo or webpage.
Each account on github can be associated to one or multiple websites that are hosted on Github.com. To setup a website on Github (also called github-pages
or gh-pages
), we need to create a repository with a special name. For this course we will require you to upload results from the portfolio assignments to you personal github pages. This is a hard requirement for passing this course. If you have no or an empty portfolio, you cannot pass this course You may add any Rmd's with the general exersizes in this course if you feel like you would like to showcase them in your portfolio.
Maak je portfolio beschikbaar via GitHub pages. Volg hiervoor deze instructies.
Start daarnaast met de vrije opdracht, d.w.z. opdracht 2 van de portfolio-opdrachten. Zorg ervoor dat je al bepaalt welke dagen je aan de vrije opdracht gaat werken (32 uur in totaal).
For a complete manual on working with github pages, see the gh-pages docs
If you want to go for a full continuous integration workflow (which is highly recommended), you can follow these steps in this blog
https://github.com/jvandemo/github-scrum-workflow https://docs.github.com/en/free-pro-team@latest/github/managing-your-work-on-github/managing-your-work-with-issues-and-pull-requests
CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Unless it was borrowed (there will be a link), in which case, please use their license.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.