parse_gitlog_entity | R Documentation |
Refines the parsed git log to include information of what entities a developer changed when performing a commit. Changed entities are obtained by examining if a changed line is within the start and end line of any of the available Universal Ctags types specified in 'kinds'.
An entity is defined and detected by Universal Ctags by language. The list of available 'kinds' is currently Classes ('c'), Functions ('f'), and Methods ('m'), which can be specified to the parameter 'kinds' as follows:
list(
java=c('c','m'),
python=c('c','f'),
cpp=c('c','f'),
c=c('f')
)
For example, if the kind is 'f', the output will be all line addition changes to functions per commit in the project. If the kind is 'c', then all changes to classes per commit will be provided.
Any combination of types can be provided per language, which will result in the output containing the union of all changes per commit made by developers to these entities. Note because Ctags assigns a type per line changed, if a change is done to a method of a class, then the changed line will be assigned only the method, and not both method and class.
The enumerated 'kinds' will be used as needed, and therefore
it is fine to specify languages not included in the project
to save time.
However, files analyzed must have their language specified.
Therefore, ensure filter_by_file_extension
is properly
used on the parameter 'project_git_log'.
This decision is by design: 'kinds' vary per language, and may
substantially impact the output of this function, affecting the
analysis. Therefore, no default settings are provided to encourage
both filter_by_file_extension
and 'kinds' parameters are properly documented in a project
configuration file to facilitate reproducibility.
Other entity types will be added in a later version.
Please note this function will blame every file in a git log to parse the data. Even for a 200 MB project git log this can take one or more hours. Also, because this function relies on git blame, only line addition changes will be captured. Line deletions will -not- be captured. For example, if a developer removes a line of a function through a commit, this data will not be available in this function output.
See Joblin'17 Chapter 3.1.1.1 for background and conceptual details.
parse_gitlog_entity(
git_repo_path,
utags_path,
project_git_log,
kinds,
progress_bar = FALSE
)
git_repo_path |
path to git repo (ends in .git) |
utags_path |
The path to utags binary. |
project_git_log |
A parsed git project by |
kinds |
A named list of character vectors of the form: list(extension_1 = c('type_i','type_j',...), extension_2 = c('type_i','type_k')). See examples. |
progress_bar |
a boolean specifying if a progress bar should be shown. |
Mitchell Joblin (2017). Structural and Evolutionary Analysis of Developer Networks. (Doctoral dissertation, University of Passau, Germany).
## Not run:
# Obtain additions only to functions
kinds <- list(
java = c('m'),
python = c('f'),
cpp = c('c', 'f'),
c = c('f')
# Parse Project Git Log
project_git_log <- parse_gitlog(perceval_path, git_repo_path)
# Filter Files
project_git_log <- project_git_log %>%
filter_by_file_extension(file_extensions, "file") %>%
filter_by_filepath_substring(substring_filepath, "file")
# Parse Function Additions
changed_functions <- parse_gitlog_entity(git_repo_path,
utags_path,
project_git_log,
kinds)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.