Introduction

Kaiaulu interface to GitHub API heavily relies on gh, a minimalistic client to access GitHub’s REST and GraphQL APIs. In essence, Kaiaulu only defines a few API endpoints of interest where the tool is currently used, and parses the returned JSON output into a table keeping only fields of interest. But more can be added later. Please see Kaiaulu's Docs Function API to see what is currently available.

In this Vignette, I will show how to replicate Aleksander Konnerup data acquisition pipeline.

Create a Personal Token

GitHub limits the number of API calls per IP to only 60 attempts every hour at the time this vignette was created. You can check the current rates at its website.

If using a personal account token from a free GitHub account, the number of API calls per hour increases to 5000 per hour. Therefore, it is recommended you create a personal token by following the GitHub Documentation instructions. The process should not take more than 2 minutes.

The functions in Kaiaulu will assume you have a token available, which can be passed as parameter.

rm(list = ls())
require(kaiaulu)
require(data.table)
require(jsonlite)
require(knitr)

GitHub Feature: Assign Issue to Issue Commenters

The feature of interest in this notebook is the introduced capability for project owners to assign issue to issue commenters (i.e. those without merge access), as discussed in the following blog post: https://github.blog/2019-06-25-assign-issues-to-issue-commenters.

The goal of the following steps is to obtain the data when a project started assigning issues to issue commenters without merge access.

Necessary Parameters

To use the pipeline, you must specify the organization and project of interest, and your token.

conf <- parse_config("../conf/kaiaulu.yml")
owner <- get_github_owner(conf, "project_key_1") # Has to match github organization (e.g. github.com/sailuh)
repo <- get_github_repo(conf, "project_key_1") # Has to match github repository (e.g. github.com/sailuh/perceive)
save_path_issue_or_pr_comments <- path.expand(get_github_issue_or_pr_comment_path(conf, "project_key_1"))
save_path_issue_event <- get_github_issue_event_path(conf, "project_key_1")
save_path_commit <- get_github_commit_path(conf, "project_key_1")
# your file github_token contains the GitHub token API obtained in the steps above
token <- scan("~/.ssh/github_token",what="character",quiet=TRUE)

Collecting Data via GitHub API

In this section we obtain the raw data (.json) containing all information the GitHub API endpoint provides. We parse the information of interest in the subsequent section.

Issue Events

First we obtain all issue events of the project, so we may later subset issue assignments.

gh_response <- github_api_project_issue_events(owner,repo,token)
github_api_iterate_pages(token,gh_response,save_path_issue_event,prefix="issue_event")

Commits

Next we download commit data from GitHub API. This will be used to know which users in the issue events have or not merge permissions.

gh_response <- github_api_project_commits(owner,repo,token)
github_api_iterate_pages(token,gh_response,save_path_commit,prefix="commit")

Parsing Raw Data to Csv

To parse raw data, we use the associated endpoint parser functions. Keep in mind these functions only parse a subset of all the information in the json ("column wise"). Please consult with the GitHub API or inspect the raw data directly to see all information which is available.

all_issue_event <- lapply(list.files(save_path_issue_event,full.names = TRUE),read_json)
all_issue_event <- lapply(all_issue_event,github_parse_project_issue_events)
all_issue_event <- rbindlist(all_issue_event,fill=TRUE)
all_issue_event[,issue_body:=NULL] # removing column just to prevent html rendering error
kable(head(all_issue_event))
all_commit <- lapply(list.files(save_path_commit,full.names = TRUE),read_json)
all_commit <- lapply(all_commit,github_parse_project_commits)
all_commit <- rbindlist(all_commit,fill=TRUE)
all_commit[,commit_message:=NULL] # removing column just to prevent html rendering error

kable(head(all_commit))

Obtaining Issue Assignments from Non-Committers

With the two tables above, the list of all issue events are calculated and shown below.

all_issue_event_assigned <- all_issue_event[event == "assigned"]
assigned_users_without_merge_access <-which(!(all_issue_event_assigned$issue_assignee_login %in%
                                                unique(all_commit$committer_login)))

kable(all_issue_event_assigned[assigned_users_without_merge_access])


sailuh/kaiaulu documentation built on Dec. 10, 2024, 3:14 a.m.