parse_gitlog_entity: Parse Git log entities by line additions

parse_gitlog_entityR Documentation

Parse Git log entities by line additions

Description

Refines the parsed git log to include information of what entities a developer changed when performing a commit. Changed entities are obtained by examining if a changed line is within the start and end line of any of the available Universal Ctags types specified in 'kinds'.

An entity is defined and detected by Universal Ctags by language. The list of available 'kinds' is currently Classes ('c'), Functions ('f'), and Methods ('m'), which can be specified to the parameter 'kinds' as follows:

list( java=c('c','m'), python=c('c','f'), cpp=c('c','f'), c=c('f') )

For example, if the kind is 'f', the output will be all line addition changes to functions per commit in the project. If the kind is 'c', then all changes to classes per commit will be provided.

Any combination of types can be provided per language, which will result in the output containing the union of all changes per commit made by developers to these entities. Note because Ctags assigns a type per line changed, if a change is done to a method of a class, then the changed line will be assigned only the method, and not both method and class.

The enumerated 'kinds' will be used as needed, and therefore it is fine to specify languages not included in the project to save time. However, files analyzed must have their language specified. Therefore, ensure filter_by_file_extension is properly used on the parameter 'project_git_log'. This decision is by design: 'kinds' vary per language, and may substantially impact the output of this function, affecting the analysis. Therefore, no default settings are provided to encourage both filter_by_file_extension and 'kinds' parameters are properly documented in a project configuration file to facilitate reproducibility.

Other entity types will be added in a later version.

Please note this function will blame every file in a git log to parse the data. Even for a 200 MB project git log this can take one or more hours. Also, because this function relies on git blame, only line addition changes will be captured. Line deletions will -not- be captured. For example, if a developer removes a line of a function through a commit, this data will not be available in this function output.

See Joblin'17 Chapter 3.1.1.1 for background and conceptual details.

Usage

parse_gitlog_entity(
  git_repo_path,
  utags_path,
  project_git_log,
  kinds,
  progress_bar = FALSE
)

Arguments

git_repo_path

path to git repo (ends in .git)

utags_path

The path to utags binary.

project_git_log

A parsed git project by parse_gitlog.

kinds

A named list of character vectors of the form: list(extension_1 = c('type_i','type_j',...), extension_2 = c('type_i','type_k')). See examples.

progress_bar

a boolean specifying if a progress bar should be shown.

References

Mitchell Joblin (2017). Structural and Evolutionary Analysis of Developer Networks. (Doctoral dissertation, University of Passau, Germany).

Examples

## Not run: 
# Obtain additions only to functions
kinds <- list(
java = c('m'),
python = c('f'),
cpp = c('c', 'f'),
c = c('f')
# Parse Project Git Log
project_git_log <- parse_gitlog(perceval_path, git_repo_path)
# Filter Files
project_git_log <- project_git_log  %>%
  filter_by_file_extension(file_extensions, "file")  %>%
  filter_by_filepath_substring(substring_filepath, "file")
# Parse Function Additions
changed_functions <- parse_gitlog_entity(git_repo_path,
                                        utags_path,
                                        project_git_log,
                                        kinds)

## End(Not run)

sailuh/kaiaulu documentation built on Dec. 10, 2024, 3:14 a.m.