Kaiaulu interface to GitHub API heavily relies on gh, a minimalistic client to access GitHub’s REST and GraphQL APIs. In essence, Kaiaulu only defines a few API endpoints of interest where the tool is currently used, and parses the returned JSON output into a table keeping only fields of interest. But more can be added later. Please see Kaiaulu's Docs Function API to see what is currently available.
In this Vignette, I will show how to replicate Aleksander Konnerup data acquisition pipeline.
GitHub limits the number of API calls per IP to only 60 attempts every hour at the time this vignette was created. You can check the current rates at its website.
If using a personal account token from a free GitHub account, the number of API calls per hour increases to 5000 per hour. Therefore, it is recommended you create a personal token by following the GitHub Documentation instructions. The process should not take more than 2 minutes.
The functions in Kaiaulu will assume you have a token available, which can be passed as parameter.
rm(list = ls()) require(kaiaulu) require(data.table) require(jsonlite) require(knitr)
The feature of interest in this notebook is the introduced capability for project owners to assign issue to issue commenters (i.e. those without merge access), as discussed in the following blog post: https://github.blog/2019-06-25-assign-issues-to-issue-commenters.
The goal of the following steps is to obtain the data when a project started assigning issues to issue commenters without merge access.
To use the pipeline, you must specify the organization and project of interest, and your token.
conf <- yaml::read_yaml("../conf/kaiaulu.yml") owner <- conf[["issue_tracker"]][["github"]][["owner"]] # Has to match github organization (e.g. github.com/sailuh) repo <- conf[["issue_tracker"]][["github"]][["repo"]] # Has to match github repository (e.g. github.com/sailuh/perceive) save_path <- path.expand(conf[["issue_tracker"]][["github"]][["replies"]]) # Path you wish to save all raw data. A folder with the repo name and sub-folders will be created. # your file github_token contains the GitHub token API obtained in the steps above token <- scan("~/.ssh/github_token",what="character",quiet=TRUE)
In this section we obtain the raw data (.json) containing all information the GitHub API endpoint provides. We parse the information of interest in the subsequent section.
dir.create(paste0(save_path))
First we obtain all issue events of the project, so we may later subset issue assignments.
save_path_issue_event <- paste0(save_path,"/issue_event/")
gh_response <- github_api_project_issue_events(owner,repo,token) dir.create(save_path_issue_event) github_api_iterate_pages(token,gh_response,save_path_issue_event,prefix="issue_event")
Next we download commit data from GitHub API. This will be used to know which users in the issue events have or not merge permissions.
save_path_commit <- paste0(save_path,"/commit/")
gh_response <- github_api_project_commits(owner,repo,token) dir.create(save_path_commit) github_api_iterate_pages(token,gh_response,save_path_commit,prefix="commit")
To parse raw data, we use the associated endpoint parser functions. Keep in mind these functions only parse a subset of all the information in the json ("column wise"). Please consult with the GitHub API or inspect the raw data directly to see all information which is available.
all_issue_event <- lapply(list.files(save_path_issue_event,full.names = TRUE),read_json) all_issue_event <- lapply(all_issue_event,github_parse_project_issue_events) all_issue_event <- rbindlist(all_issue_event,fill=TRUE) all_issue_event[,issue_body:=NULL] # removing column just to prevent html rendering error kable(head(all_issue_event))
all_commit <- lapply(list.files(save_path_commit,full.names = TRUE),read_json) all_commit <- lapply(all_commit,github_parse_project_commits) all_commit <- rbindlist(all_commit,fill=TRUE) all_commit[,commit_message:=NULL] # removing column just to prevent html rendering error kable(head(all_commit))
With the two tables above, the list of all issue events are calculated and shown below.
all_issue_event_assigned <- all_issue_event[event == "assigned"] assigned_users_without_merge_access <-which(!(all_issue_event_assigned$issue_assignee_login %in% unique(all_commit$committer_login))) kable(all_issue_event_assigned[assigned_users_without_merge_access])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.