In sailuh/kaiaulu: Kaiaulu

rm(list = ls())
seed <- 1
set.seed(seed)

require(kaiaulu)
require(data.table)
require(knitr, quietly = T)
require(jsonlite)
require(gt)

Introduction

This Notebook showcases how to obtain JIRA issues and comments using Kaiaulu with and without authentication. We also demonstrate how you can use the refresher capability: I.e. by calling the function again, Kaiaulu will only download more recent data in a folder, provided this folder issues were also obtained with the refresh function. This is useful for server-side deployment, in combination with CRON jobs, but can also be used locally to just get more recent data conveniently.

While you have flexibility in choosing what type of fields should be included on the downloader JSON data, bear in mind only the default fields specified on the downloaders are available on the parser. These, however, are easily extensible.

Project Configuration File and Setup

In this notebook, we will obtain data from the JIRA issue tracker. We will demonstrate it on Kaiaulu JIRA sandbox. Refer to the conf/ folder on Kaiaulu's git repository for Geronimo and other project configuration files. Using a different project configuration file should only require changing the path for the configuration, provided they do not require different authentication files.

You should also have the folder structure created on your computer manually. Specifically, in the "issues" and "issue_comments" fields, you should specify the folder path where the data will be saved. We recommend you adopt the convention adopted in Kaiaulu if you are collaborating with others, as other sections of the configuration file used in other Notebooks will also adopt this overall structure.

In general, projects such as Apache Software Foundation do not require authentication to obtain issue data. Authentication is only required to obtain more sensitive data. If you are using the Kaiaulu tool for an internal JIRA account, however, authentication may be useful for you. The free version of JIRA (e.g. https://sailuh.atlassian.net/) used in this Notebook also requires authentication. In the case of the free JIRA version, your API username is your account e-mail, and your password is an API token you create. We explain how to load your username and password below.

To try out this Notebook, try the geronimo configuration file, and follow along the instructions below on how to remove the authentication parameters required for Kaiaulu.

First, we will load the Kaiaulu configuration file:

conf <- parse_config("../conf/kaiaulu.yml")

# Project domain
# Specify project_key_index in get_jira_domain() (e.g. "project_key_1")
issue_tracker_domain <- get_jira_domain(conf, "project_key_1")

# Project key
# Specify project_key_index in get_jira_project_key_name() (e.g. "project_key_1")
issue_tracker_project_key <- get_jira_project_key_name(conf, "project_key_1")

# Altered save paths. Important for naming conventions
# Specify project_key_index in get_jira_issues_path() (e.g. "project_key_1")
save_path_issue_tracker_issues <- get_jira_issues_path(conf, "project_key_1")

# Specify project_key_index in get_jira_issues_comments_path() (e.g. "project_key_1")
save_path_issue_tracker_issue_comments <- get_jira_issues_comments_path(conf, "project_key_1")

# Unaltered save paths from config file for use with refresh function
# Specify project_key_index in get_jira_issues_path() (e.g. "project_key_1")
refresh_issues <- get_jira_issues_path(conf, "project_key_1")

If authentication is needed, save your username (e-mail) and password (API token) in a file, e.g. atlassian_credentials, where the first line is the username, and the second the API token, e.g.

jondoe@jondoe.com
jondoespassword

And remove the eval = FALSE from the code block below (note you do not want to use this code block if you are accessing a JIRA instance that you do not have an account such as Apache Software Foundation, or you will have authentication errors):

if(file.exists("~/.ssh/atlassian_credentials")){
  credentials <- scan("~/.ssh/atlassian_credentials", what = "character", quiet = TRUE)  
  username <- credentials[1]
  password <- credentials[2]
}

Create the directories specified in the config files

Before proceeding, make sure you created the folder structure you specified on the project configuration file. This Notebook will not create the folders automatically.

Downloading Issues

Kaiaulu offers three ways to download issues with or without comment: By Date, by Issue Key, and Custom. The Date and Issue Key range functions wrap around the Custom function, serving as examples on how to customize it.

Issues by Date Range

Using the download_jira_issues_by_date(), you can download issues based on their 'created' dates. Th acceptable formats are:

"yyyy/MM/dd HH:mm"
"yyyy-MM-dd HH:mm"
"yyyy/MM/dd"
"yyyy-MM-dd"

If both date range parameters are set to NULL, then all issues will be retrieved. Alternatively, if for example date_lower_bound is set to "2000-01-01" and date_upper_bound is set to "2005-01-01", then only issues created between these two dates will be retrieved.

If you would like to retrieve issues only after a certain date, set date_upper_bound=NULL and date_lower_bound to the date.

If you would like to retrieve only issues before a certain date, set date_lower_bound=NULL and date_upper_bound to the date.

Note in the subsequent code block we specified the fields from the issue we are interested in downloading. Again, you may specify any fields of interest for the downloader, including custom fields of your JIRA instance, but the parser currently only support those listed here (in addition to comments data, not exemplified yet).

Beware that even if only 3 issues exist in a JIRA, a large time range will still request several API calls (in contrast to the issue endpoint below). Therefore, it is advisable to use the issue key query instead which is explained in the next sub-section.

file.exists(save_path_issue_tracker_issues)

# e.g. date_lower_bound <- "1970/01/01". 
date_lower_bound <- "2023/11/16 21:00"
date_upper_bound <- "2023/11/17 21:00"

all_issues <- download_jira_issues_by_date(issue_tracker_domain,
                           jql_query = paste0("project='",issue_tracker_project_key,"'"), 
                           fields = c("summary", 
                           "description",
                           "creator",
                           "assignee",
                           "reporter",
                           "issuetype",
                           "status",
                           "resolution",
                           "components",
                           "created",
                           "updated",
                           "resolutiondate",
                           "priority",
                           "votes",
                           "watches",
                           "versions",
                           "fixVersions",
                           "labels"), 
                           username = username,
                           password = password,
                           save_folder_path = save_path_issue_tracker_issues, 
                           max_results = 50, 
                           max_total_downloads = 60,
                           date_lower_bound = date_lower_bound,
                           date_upper_bound = date_upper_bound,
                           verbose = TRUE)

Note if authentication is not required, you should also comment the username and password parameters, or the information will be used to authenticate a JIRA instance you do not have an account, resulting in authorization errors. The downloaded data can also parsed in Kaiaulu:

parsed_jira_issues <- parse_jira(save_path_issue_tracker_issues)

head(parsed_jira_issues[["issues"]])  %>%
  gt(auto_align = FALSE)

Issues by Key Range

A way to download issues requiring less API calls is using the issue keys, using download_jira_issues_by_issue_key().

Instead of specifying the date range, you can instead specify the issue range using issue_key_lower_bound and issue_key_upper_bound to retrieve only JIRA issues with the issue key between these bounds. These parameters need to be specified in accordance with the issue key format relevant to your project. The default format is -. An example is if issue_key_lower_bound is set to "GERONIMO-740" and issue_key_upper_bound is set to "GERONIMO-6000", then only issues with issue keys between these two issue keys will be retrieved.

Some special cases are also possible: If both lower and upper bound are set to NULL, then all issues will be retrieved as the request will not be bounded.

If issues with issue key greater than an issue key value are desired, set issue_key_upper_bound to NULL and issue_key_lower_bound to the issue key value.

If issues with issue key less than an issue key value are desired, set issue_key_upper_bound to the issue key and issue_key_lower_bound to NULL.

In the subsequent codeblock, note we also include a new field, comment. This inclusion makes it so comments data is also downloaded. The comment field can also be included on any downloader or discussed in this Notebook. We also use a different folder path here, save_folder_path = save_path_issue_tracker_issue_comments, which you should have specified on your project configuration file. We separate issues with and without comments for simplicity and clarity into separate folders.

# eg issueKey_lower_bound <- "GERONIMO-740"
#issue_key_lower_bound <- "GERONIMO-5000"
#issue_key_upper_bound <- "GERONIMO-5010"

issue_key_lower_bound <- "SAILUH-1"
issue_key_upper_bound <- "SAILUH-7"

#issue_key_lower_bound <- "CAMEL-1"
#issue_key_upper_bound <- "CAMEL-800"

all_issues <- download_jira_issues_by_issue_key(domain = issue_tracker_domain,
                           jql_query = paste0("project='",issue_tracker_project_key,"'"), 
                           fields = c("summary", 
                           "description",
                           "creator",
                           "assignee",
                           "reporter",
                           "issuetype",
                           "status",
                           "resolution",
                           "components",
                           "created",
                           "updated",
                           "resolutiondate",
                           "priority",
                           "votes",
                           "watches",
                           "versions",
                           "fixVersions",
                           "labels",
                           "comment"), 
                           username = username,
                           password = password,
                           save_folder_path = save_path_issue_tracker_issue_comments, 
                           max_results = 500, 
                           max_total_downloads = 500,
                           issue_key_lower_bound = issue_key_lower_bound,
                           issue_key_upper_bound = issue_key_upper_bound,
                           verbose = TRUE)

names(parsed_jira_issues)

Observe the parser function returns a named list of two tables. We showed before how to access the issues information when we downloaded data by date range. Here, we show the comments table.

parsed_jira_issues <- parse_jira(save_path_issue_tracker_issue_comments)


head(parsed_jira_issues[["comments"]])  %>%
  gt(auto_align = FALSE)

Downloading Issues with Customizations

If you require more flexibility when using the downloader, you can use the download_jira_issues. This function does not append the time range or date range when requesting the data. You can study the previous two functions code to see how they use download_jira_issues to query specific date ranges and time ranges. For details, see the function documentation.

all_issues <- download_jira_issues(issue_tracker_domain,
                           credentials, 
                           jql_query = paste0("project='",issue_tracker_project_key,"'"), 
                           fields = c("summary", 
                           "description",
                           "creator",
                           "assignee",
                           "reporter",
                           "issuetype",
                           "status",
                           "resolution",
                           "components",
                           "created",
                           "updated",
                           "resolutiondate",
                           "priority",
                           "votes",
                           "watches",
                           "versions",
                           "fixVersions",
                           "labels"), 
                           username = username,
                           password = password,
                           save_folder_path = save_path_issue_tracker_issue_comments, 
                           max_results = 50, 
                           max_total_downloads = 5000,
                           search_query = NULL,
                           verbose = TRUE)

Refresh Issue and Comments Data

There are a few instances in which downloading the issue data with comments does not capture all the issues:

The JIRA rest API rate limit (currently 5000/hour) is reached.
Function ends before completing.
The max_downloads parameter was set to a value lower than total issues.
More issues were since posted after downloading.

Given a folder using any of the downloaders in this Notebook (with or without comments), you can use the refresh_jira_issues function to add additional issue data since the last issue was downloaded. This function relies on the naming convention the downloaders utilize on the file to perform this operation. For details, see the function documentation.

refresh_jira_issues(issue_tracker_domain,
                     credentials, 
                     jql_query = paste0("project='",issue_tracker_project_key,"'"), 
                     fields = c("summary", 
                     "description",
                     "creator",
                     "assignee",
                     "reporter",
                     "issuetype",
                     "status",
                     "resolution",
                     "components",
                     "created",
                     "updated",
                     "resolutiondate",
                     "priority",
                     "votes",
                     "watches",
                     "versions",
                     "fixVersions",
                     "labels",
                     "comment"), 
                     save_path_issue_tracker_issues, 
                     max_results = 50, 
                     max_total_downloads = 5000,
                     refresh_issues,
                     verbose = TRUE)