build
, export
parse
and transform
functions for Scitools Understand have been added. #308github_api_project_issue_search
has been added that makes the search/issues endpoint API calls. github_api_project_issue_or_pr_comments_by_date
and github_api_project_issue_by_date
have been added to download issue data and comments by date ranges. github_parse_search_issues_refresh
has been added that parses the issue data downloaded from the search endpoint in the refresh_issues folder. github_api_project_issue_refresh
and github_api_project_issue_or_pr_comment_refresh
were added to download issue data or comments respectively that have not already been downloaded. format_created_at_from_file
was added to retrieve the greatest date from a JSON file. See the Reference Docs on GitHub section for more details. #282config.R
now contains a set of getter functions used to centralize the gathering of configuration data and these getter functions are used to refactor configuration file information gathering. For example, loading configuration file information with variable assignment is as follows git_repo_path <- config_file[["version_control"]][["log"]]
but refactoring with a config.R getter function becomes git_repo_path <- get_git_repo_path(config_file)
. #230refresh_jira_issues()
had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded. #275download_jira_issues()
, download_jira_issues_by_issue_key()
, and download_jira_issues_by_date()
has been added. This allows for downloading of Jira issues without the use of JirAgileR and specification of issue Id and created ranges. It also interacts with parse_jira_latest_date()
to implement a refresh capability. #275make_jira_issue()
and make_jira_issue_tracker()
no longer create fake issues following JirAgileR format, but instead the raw data obtained from JIRA API. This is compatible with the new parser function for JIRA. #277parse_jira()
now parses folders containing raw JIRA JSON files without depending on JirAgileR. #276parse_jira_latest_date()
has been added. This function returns the file name of the downloaded JIRA JSON containing the latest date for use by download_jira_issues()
to implement a refresh capability. #276weight_scheme_cum_temporal()
weight_scheme_pairwise_cum_temporal()
when all time lag edges are used, or the existing weight schemes can also be used when using a single lag. The all lag weight schemes reproduce the same behavior as Codeface's paper. See the issue for details. #229 make_jira_issue()
and make_jira_issue_tracker()
have been added, alongside examples and unit tests for parse_jira()
. #228 make_mbox_reply
, and make_mbox_mailing_list
for unit testing and tool comparison #238git_create_sample_log
which only can create a fixed example, by adding new commands to the git
interface (which can also be used for other purposes). The new example.R
module can now be used to document examples, using the extended git.R
interface, that reflect edge cases on raw data. The unit tests
can then rely on the example functions to temporarily create, test parser functionality against the fake minimal example, and subsequently delete it. This in essence allows for unit testing of parser data, and consequently evaluating behavior of 3rd party tools Kaiaulu may rely on for some functionality remains consistent across their updates on features we care about in an automated manner. Examples is in fact a way to create often requested minimal reproducible examples on Stack Overflow. Old unit tests which rely on the git_create_sample_log()
will be updated in a subsequent commit to rely on the new interface via example datasets. #227dv8_mdsmb_to_flaws()
function now offers an optional boolean parameter is_file_only_metric
which can be used to compute file metrics more efficiently. Note this should not be used if the intent is to aggregate the file metrics, as they may be counted twice or more if the files participate in the same flaw pattern id. See the causal flaws notebook for an example on how to use it. #246. jira.R
along with a test-jira.R
which identifies a bug in parse-jira.R (the function parse_jira() will be moved to jira.R at a later date for consistency). The bug is still present, so the test should fail on GitHub Actions. A solution to the bug will be added on a subsequent commit to pass the test. #244io_create_folder()
, io_make_sample_file
(), git_init()
, git_add()
, git_commit()
in R/git.R
and create test cases with unit tests in R/example.R
and testthat/test-parser.R
. #227gitlog_entity_showcase_parallel.Rmd
for details. #231parse_jira_rss_xml()
, which enables reusing the full 26 projects dataset of our prior TSE work. #218metric_file_bug_frequency()
, metric_file_non_bug_frequency()
, metric_file_bug_churn()
, metric_file_non_bug_churn()
, metric_file_churn()
to R/metric.R
#214src_text_showcase.Rmd
for details. #206.r
and .R
files are also now captured (previously only one of the two were specified, but R accepts both). #235parse_dv8_architectural_flaws
so users can specify the dsm that should be used when reconstructing the architectural flaw instances per file. #222 causal_flaws.Rmd
notebook #220filter_by_commit_size()
, has been added. This filter mitigates outline co-change resulting from git log projections, which may lead to a "all-vs-all" explosion of edges. E.g. Apache Geronimo SVN to Git migration contains a commit which modifies 1522 files. Said 1522 files would be co-changed with each other generating 1522 Choose 2 = 1,157,481 alone, which not accurately reflect actual "co-change". Use of this filter is strongly encouraged for graph_to_dsmj
or any operations that require git log projection. #209github_parse_project_issue()
, github_parse_project_pull_request()
so bug count can also be computed from GitHub API. #216parse_dv8_architectural_flaws()
. Each tick tracks one folder of flaws (progressBar auto resets the tick to 0 on loop completion, so instance progress bar requires further function refactoring and is deferred for now). #209/
remaining in relative filepath of parse_dependencies()
. #219parse_bugzilla_rest_issues
, parse_bugzilla_rest_comments
, download_bugzilla_rest_issues_comments
, and parse_bugzilla_rest_issues_comments
. #164download_bugzilla_rest_issues
and download_bugzilla_rest_comments
to download project data from bugzilla site using REST API. #177download_bugzilla_perceval_traditional_issue_comments
, download_bugzilla_perceval_rest_issue_comments
, parse_bugzilla_perceval_traditional_issue_comments
, and parse_bugzilla_perceval_rest_issue_comments
to download and parse project data from bugzilla site using perceval. #155graph_to_dsmj
, transform_dependencies_to_sdsmj
, transform_gitlog_to_hdsmj
,
transform_temporal_gitlog_to_adsmj
to convert a dsm into a json format. #184dv8_clsxb_to_clsxj
, parse_dv8_clusters
, dependencies_to_sdsmj
, and gitlog_to_hdsmj
for DV8 integration with Kaiaulu. #168 dv8_gitlog_to_gitnumstat
dv8_gitnumstat_to_hdsmb
dv8_hsdsmb_to_decoupling_level
dv8_hsdsmb_to_hierclsxb
dv8_hsdsmb_drhier_to_excel
parse_dv8_metrics_decoupling_level
#169 issue_social_smell_showcase.Rmd
vignette for details. #144download_mod_mbox_per_month()
function which allows for the intermediate mbox downloaded files to be saved to the chosen folder (as opposed to tmp). The function is showcases on download_mod_mbox.Rmd
vignette. #141download_github_comments.Rmd
now include author and committer name and e-mail to support identity matching. #133parse_gitlog
. The branch parameter, which is also used later in the notebook to reset the branch after performing git checkout to calculate line metrics, is now a project configuration file parameter. #132social_smell_showcase.Rmd
Notebook. Moreover, both download_jira_data.Rmd
and download_github_comments.Rmd
have been standardized to provide the raw json data, whereas parse_jira_replies()
and parse_github_replies()
provide the same formatted reply
table as parse_mbox()
, which allows combining the various sources simply by using native rbind()
function. #133. download_github_comments.Rmd
for example usage. #130output_dir
folder for more flexibility. #168 social_smell_showcase.Rmd
. #133body.plain
or body.plain.simple
. The parse_mbox()
function now handles both cases. #133social_smell_showcase.Rmd
, the variables i_commit_hash
and j_commit_hash
were subject to the ordering of the rows as input. The code now correctly chooses the earliest date and latest date within a time window, instead of assuming the first row and last row are such. In turn, this now reflects in the correct commit hash interval being reported in the final table, and the correct git checkouts being applied to line metrics. #126smells.R
, used by social_smell_showcase.Rmd
, smell_organizational_silo, smell_missing_links, and smell_radio_silence mapping of text to numerical identities was incorrect or missing. One side effect of this error as reported in the issue, is that different orderings of the rows provided as input to the function caused different metric values. However, the metric should be independent of the ordering regardless. This issue address the ordering side effect and corrects the metric value. #126download_jira_data.Rmd
, the jira issue downloader's output contained a mismatch between column names and values when converting the json to table. The conversion is now done in Kaiaulu instead of the external package, and the external package is only used to obtain the json. In addition, parse_jira_comments()
has been refactored into parse_jira()
, which handles both issues and/or comments jsons obtained from the external package. #120get_date_from_commit_hash()
and filter_by_file_extensions()
#154mailinglist_showcase.Rmd
has been renamed to reply_communication_showcase.Rmd
to account for issue tracker network communication. Likewise, transform_mbox_to_bipartite_network
has been renamed to transform_reply_to_bipartite_network
to reflect accepting both mbox and jira reply data as parameter. The notebook also now presents how to load jira issue comment networks (obtained using download_jira_data.Rmd
, and combining the networks. A new function, parse_jira_comments()
was also added to standardized the input to conform to Kaiaulu nomenclature of communication data. #113download_mod_mbox.Rmd
. #112download_jira_data.Rmd
and bug_count.Rmd
, and one project configuration file, geronimo.yml
now demonstrate how JIRA issue data can be downloaded and used to calculate file bug count using existing Kaiaulu functionality and an external JIRA API R package. In combination with the existing gitlog_vulnerabilities_showcase.Rmd
, Kaiaulu can now download and parse both software vulnerabilities (CVEs) and issue IDs. The download_jira_data.Rmd
can also be used to obtain issue comment data, which may be used to construct communication networks in combination to mailing list data. #110recolor_network_by_community
. #94download_mod_mbox()
function to download.R module, allowing the composition of .mbox files from Apache mod_mbox archives. #99.download_pipermail()
and convert_pipermail_to_mbox
functions #93.graph.R
bipartite_graph_projection()
#75. parser.R
and network.R
API now abide by a standardized nomenclature for the data columns, instead of using third party software nomenclature, which led to multiple names when data overlapped among third party software. The Network module function prefix was also replaced from parse_*network to transform*_network. Various transformation functions were also renamed to explicitly indicate it generates bipartite networks (previously it did not), instead of temporal. The network functions to transform git logs, be it bipartite or temporal now account for all types of networks (i.e. author-file, author-entity, committer-file, committer-entity, etc). The "mode" parameter is also more explicit on what types of functions it can create. #43
Parser functions no longer normalize the timezone to UTC. This is now exemplified in all Notebooks instead for when time slices are needed. Therefore, it is now possible to implement the socio-technical metric num.tz
. To minimize risk timestamps are no longer aligned, datetimes are left as strings instead of parsed as posix.ct objects. #89
All notebooks now use the new identity match interface from #56, consequently users can now choose to display to either bipartite or temporal transformations whether to display the nodes with the project's name and e-mail or their id, if publishing information online to protect the project's developers privacy. #90
Fixes the column naming for the parse_dependencies()
. Previously src
and dest
, and now from
and to
, consistent to other networks derived from graph.R
. #75
Fixes tools.yml to use the correct undir
and dir
of OSLOM (previously the paths were inverted). #75
gitlog_showcase.Rmd
. #110commit_message_id_coverage
. #110download_mod_mbox
missing leading zeros. #107parse_gitlog_entity()
, and associated network
functions to visualize both author-entity bipartite network parse_gitlog_entity_network()
and temporal parse_gitlog_entity_temporal_network()
. It is therefore now possible to compare networks at file or any type of entity of interest, with different network construction methods. See vignettes/gitlog_entity_showcase.Rmd for details. #79parse_gitlog_temporal_network()
which provides a directed network for collaboration at file level. #78parse_line_type()
to parse_line_type_file()
to take as input information from git history instead of a local computer file, so it can be used to analyze git log changes. #2git_blame()
wrapper and parser. #68git_head()
and checkout to a particular commit git_checkout()
, the later required to analyze multiple intervals with static code analysis such as parse_line_metrics()
and parse_dependencies()
. A vignette will be added at a later date showcasing the functions. #62parse_line_type()
. See line_type_showcase.Rmd for usage. #60parse_line_metrics()
. See line_metrics_showcase.Rmd for example usage. #59parse_java_code_refactoring_json()
. See refactoringminer_showcase.Rmd for example usage. #57parse_nvdfeed()
and parse_cve_cwe_file_network()
. See gitlog_showcase.Rmd for example usage. #51commit_message_id_coverage()
. See example on gitlog_showcase.Rmd. #46parse_commit_message_id_network()
. example of interesting labels are issue ids and cve-ids. You can now also specify them directly on the config files (see conf folder). Vignettes/gitlog_showcase.Rmd has been updated to showcase a cve-id network. #46metric_churn_per_commit_interval()
metric_churn_per_commit_per_file()
logic were substantially simplified, and can now be used with interval/R #19. interval.R/interval_commit_metric()
and parsers.R/filter_by_commit_interval()
. #44 filter_by_file_extension()
filter_by_filepath_substring()
for files not relevant for metrics. config file schema also has been extended to provide parameters to the filters. #30assign_identity()
, which assigns a single id from authors who use different names and emails in parse_gitlog()
, parse_mbox()
, or across both data. This allows parse_gitlog_network()
and parse_mbox_network()
to be merged into a single network. See vignettes/merging_networks_showcase.Rmd for details. A normalized edit distance function was also added for future implementation of partial matching normalized_levenshtein()
. #31commit -s
. #36metric_churn()
and metric_commit_interval_churn()
. See vignettes/churn_metrics.Rmd for details. #19parse_gitlog()
, and edgelist export for network libraries parse_gitlog_network()
. See vignettes/gitlog_showcase.Rmd for details. #1parse_mbox()
, and edgelist export for network libraries parse_mbox_network()
. See vignettes/mailinglist_showcase.Rmd for details. #4parse_dependencies()
, and edgelist export for network libraries parse_dependencies_network()
. See vignettes/depends_showcase.Rmd for details. #8parse_log_*()
functions now provide edgelist instead of igraph objects. vignettes were adjusted to showcase usage. #14Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.