require(kaiaulu) require(data.table) require(knitr) require(magrittr)
This notebook showcases Kaiaulu interface to the SCC tool. In order to use this Notebook, you must download SCC and specify the SCC path in tools.yml and the project being analyzed. See README.md for third party tools setup details.
Notebooks in Kaiaulu use project configuration files. You can always directly hardcode paths in your notebook to bypass them, but using the config files helps ensure reproducibility of your analysis.
This Notebook uses the APR config file on the Kaiaulu repo. This is the relevant portion used in this Notebook:
version_control: # Where is the git log located locally? # This is the path to the .git of the project repository you are analyzing. # The .git is hidden, so you can see it using `ls -a` log: ../../rawdata/git_repo/APR/.git # From where the git log was downloaded? log_url: https://github.com/apache/apr # List of branches used for analysis branch: - 1.7.4 filter: keep_filepaths_ending_with: - cpp - c - h - java - js - py - cc remove_filepaths_containing: - test
We load the necessary information in this code block:
tool <- parse_config("../tools.yml") conf <- parse_config("../conf/apr.yml") scc_path <- get_tool_project("scc", tool) git_repo_path <- get_git_repo_path(conf) git_branch <- get_git_branches(conf)[1] # Filters file_extensions <- get_file_extensions(conf) substring_filepath <- get_substring_filepath(conf)
The loaded variables are just strings or numbers. You can always inspect any variable to double check what is being used in the analysis. For instance, we are interested on analyzing a specific release of APR here:
git_branch
We must therefore git checkout the APR repo to said release before computing file metrics. We can do this using Kaiaulu Git interface:
git_checkout(git_branch,git_repo_path)
Since git_checkout is a simple interface to the actual git checkout
command, we can also specify a commit hash, or a branch of interest. Storing this information is helpful for reproducibility, as even if the APR git log is updated locally, the Notebook will still adjust to the right point in time when redoing the analysis.
With the source code on local disk adjusted to the release of interest, we can call parse_line_metrics
with the scc_path. This function highlights Kaiaulu approach to third party tools: The dependency to the external tool is explicitly made by receiving its path as parameter. The function has the single responsibility of knowing how to interface with the tool, requesting the data, formatting it and providing it back in R as a data.table object.
A sample of rows is shown below:
line_metrics_dt <- parse_line_metrics(scc_path,git_repo_path) kable(head(line_metrics_dt,10))
The column names are preserved to the original tool to facilitate referencing. Future versions of Kaiaulu will define a common data schema and mapping to the original tool for clarity. The obtained table can be combined to other analysis in Kaiaulu. See for example depends_showcase.Rmd
for dependency analysis, or gitlog_showcase.Rmd
.
As shown above, the line metrics include all types of files, including the LICENSE and make files. We may only be interested in looking at the source code. To do so, we can use Kaiaulu filters. If we specify on the project configuration files the extensions and prefixes of interest to be kept:
file_extensions
And removed:
substring_filepath
We can then subset the resulting table column Location
like so:
filtered_line_metrics_dt <- line_metrics_dt %>% filter_by_file_extension(file_extensions,"Location") %>% filter_by_filepath_substring(substring_filepath,"Location") kable(head(filtered_line_metrics_dt))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.