## NOTE:    Vignettes are built within their own environment. 
##          Therefore, changing the library paths (libpaths) here will not change the global libpaths.

# Get test library path
testLibPath <- tempdir()

# Get current library paths
origLibPaths <- .libPaths()

# Create new library paths for TESTING
.libPaths(new = c(testLibPath, origLibPaths))

# Create packages within new library
pkgnet:::.BuildTestLib(targetLibPath = testLibPath)

knitr::opts_chunk$set(
  collapse = FALSE,
  comment = "#>",
  fig.align = 'center',
  out.width='100%'
)

pkgnet is an R package designed for the analysis of R packages! The goal of the package is to build graph representations of a package's various types of dependencies. This can inform a variety of activities, including:

Below is a brief tour of pkgnet and its features.


Packages as a Graph

pkgnet represents aspects of R packages as graphs. The two default reporters, which we will discuss in this vignette, model their respective aspects as directed graphs: a package's dependencies on other packages, and the interdependencies of functions within a package. Before we look at the output of pkgnet, here are few core concepts to keep in mind.

nodes <- data.frame(
  id = 1:4
  , label = LETTERS[1:4]
  , x = c(1,3,2,4)
  , y = c(1,1,2,3)
  , level = c(1,1,2,3)
  )

edges <- data.frame(
  from = c(3,3,4,4)
  , to = c(1,2,3,2)
  )

g <- visNetwork::visNetwork(nodes = nodes
                       , edges = edges
                       , width = "100%"
                       )
g <- visNetwork::visNodes(graph = g
                          , shape = "circle"
                          , font = list(size = 25
                                        , bold = TRUE
                                        , align = 'center'
                                        )
                       )
g <- visNetwork::visEdges(graph = g
                          , arrows = "to"
                       )
g <- visNetwork::visHierarchicalLayout(graph = g
                      , direction = "DU"
                      , sortMethod = "directed"
                      )
g <- visNetwork::visInteraction(graph = g
                                , dragNodes = TRUE
                                , dragView = TRUE
                                , zoomView = FALSE)
g

Dependency

Units of the analysis are represented as nodes, and their dependency relationships are represented as edges (a.k.a. arcs or arrows). In pkgnet, the nodes could be functions in the package you are examining, or other packages that the package depends on. The direction of edges point in the direction of dependency---the tail node depends on the head node.[^1]

[^1]: This follows the Unified Modeling Language (UML) framework, a widely used standard for software system modeling.

In the example dependency graph above:

Following the direction of the edges allows you to figure out the dependencies of a node---the other nodes that it depends on. On the flip side, tracing the edges backwards allows you to figure out the reverse dependencies (i.e., dependents) of a node---the other nodes that depend on it.


Running pkgnet

pkgnet can analyze any R package locally installed. (Run installed.packages() to see the full list of packages installed on your system.) For this example, let's say we are analyzing a custom built package, baseballstats.

To analyze baseballstats, run the following two lines of code:

library(pkgnet)
report1 <- CreatePackageReport(pkg_name = "baseballstats")

THAT'S IT! You have generated a lot of valuable information with that one call for an installed package.

However, if the full source repository for the package is available on your system, you can supplement this report with other information such as code coverage from covr. To do so, specify the path to the repository in CreatePackageReport.

library(pkgnet)
report2 <- CreatePackageReport(
  pkg_name = "baseballstats"
  , pkg_path = <path to the repo>
)

The HTML Report & Returned Object

CreatePackageReport() creates an HTML report with the pertinent information, and it also returns an object with the report information and more. The location of the HTML report is specified in the messages in the terminal, but it should render automatically in your browser.

Report Sections

These will display in the HTML report, and their content will also be attached as public bindings in the PackageReport object returned from CreatePackageReport().

Package Summary

SummaryReporter: This section displays general information about the package. The returned object contains basic information like the package name and path.

Dependency Network

DependencyReporter: This section displays information regarding the packages upon which the current package under analysis depends. This includes both base and third-party R packages. The returned object contains graph visualizations, graph measures and data tables among other methods.

Function Network

FunctionReporter: This section displays information regarding the functions within the current package under analysis and their interdependence network. The returned object contains graph visualizations, graph measures and data tables among other methods.

Class Inheritance Network (Optional)

InheritanceReporter: While not generated by default (as not all packages are object oriented), this reporter is very useful when investigating the parent-child structure of R6, S4 or Reference Class (a.k.a. "R5") objects. The inheritance graph is displayed in the report along with other information. The returned object contains graph visualizations, graph measures and data tables among other methods.

Network Based Section Detail

Aside from the Package Summary section and its returned object, each reporter is based around a graph structure. Let's look at the FunctionReporter from baseballstats in more detail.

Visualizations

Here's how the Function Network Visualization looks for baseballstats. Note, its appearance differs based on if pkg_path is specified in CreatePackageReport():

:::: {style="content: ''; display: table; clear: both;"}

::: {style="float:left;width: 48%;text-align: justify;text-justify: inter-word;margin-right: 4%;"}

Default

pkgnet:::silence_logger()
funcReporter1 <- pkgnet::FunctionReporter$new()
funcReporter1$set_package('baseballstats')
funcReporter1$layout_type <- "layout_as_tree"
g <- visNetwork::visHierarchicalLayout(
    graph = funcReporter1$graph_viz
    , direction = "UD"
    , sortMethod = "directed"
    , edgeMinimization = FALSE
)
g <- visNetwork::visInteraction(graph = g
                                , dragNodes = TRUE
                                , dragView = TRUE
                                , zoomView = FALSE)
g

All functions and their dependencies are visible. For example, we can see that both batting_avg and slugging_avg functions depend upon the at_bats function.

We also see that nothing depends on the on_base_pct function. This might be valuable information to an R package developer.

:::

::: {style="float: right; width: 48%; text-align: justify; text-justify: inter-word;"}

With Coverage Information

pkgnet:::silence_logger()
funcReporter2 <- pkgnet::FunctionReporter$new()
funcReporter2$layout_type <- "layout_as_tree"
funcReporter2$set_package(
    pkg_name = "baseballstats"
    , pkg_path = system.file('baseballstats',package="pkgnet")
)
funcReporter2$calculate_default_measures()
g <- visNetwork::visHierarchicalLayout(
    graph = funcReporter2$graph_viz
    , direction = "UD"
    , sortMethod = "directed"
    , edgeMinimization = FALSE
)
g <- visNetwork::visInteraction(graph = g
                                , dragNodes = TRUE
                                , dragView = TRUE
                                , zoomView = FALSE)
g

Same as the default visualization except we can see coverage information as well (Pink = 0%, Green = 100%).

It appears the function with the most dependencies, at_bats, is well covered. However, no other functions are covered by unit tests. :::

::::

Node Measures

Metrics for the nodes (either packages, functions, or classes depending on the reporter) are contained in a table:

# We initialized just the reporters because we didn't want to actually generate the full html report. So we'll put funcReporter2 into a list to mock the interface for the example
report2 <- list(FunctionReporter = funcReporter2)
colSubset <- c('node','type','betweenness','outDegree','inDegree','numRecursiveDeps')
report2$FunctionReporter$nodes[,..colSubset]

Note, a few of these metrics provided by default are from the field of Network Theory. You can leverage the Network Graph Model Object described below to derive many more.

Network Measures

Network-level measures are contained in a network_measures list.

report2$FunctionReporter$network_measures

Network Graph Model Object

The network model object itself is contained in the pkg_graph attribute. The igraph formatted object itself is directly accessible via pkg_graph$igraph.

report2$FunctionReporter$pkg_graph$node_measures(c('hubScore', 'authorityScore'))
report2$FunctionReporter$pkg_graph$igraph

A Deeper Look

With the reports and objects produced by pkgnet by default, there is plenty to inform us on the inner workings of an R package. However, we may want to know MORE! Since the igraph objects are available, we can leverage those graphs for further analysis.

In this section, let's examine a larger R package, such as lubridate.

If you would like to follow along with the examples in this section, run these commands in your terminal to download and install lubridate[^2].

[^2]: Examples from version 1.7.3 of Lubridate

# Create a temporary workspace
mkdir -p ~/pkgnet_example && cd ~/pkgnet_example

# Grab the lubridate source code
git clone https://github.com/tidyverse/lubridate
cd lubridate

# If you want the examples to match exactly
git reset --hard 9797d69abe1574dd89310c834e52d358137669b8

# Install it
R CMD install .

Coverage of Most Depended-on Functions

Let's examine lubridate's functions through the lens of each function's total number of dependents (i.e., the other functions that depend on it) and its code's unit test coverage. In our graph model for the FunctionReporter, the subgraph of paths leading into a given node is the set of functions that directly or indirectly depend on the function that node represents.

# Run pkgnet
library(pkgnet)
report2 <- CreatePackageReport(
    pkg_name = "lubridate"
    , pkg_path = "~/pkgnet_example/lubridate"
)

# Extract Nodes Table
funcNodes <- report2$FunctionReporter$nodes

# List Coverage For Most Depended-on Functions
mostRef <- funcNodes[order(numRecursiveRevDeps, decreasing = TRUE),
                     .(node, numRecursiveRevDeps, coverageRatio, totalLines)
                     ][1:10]
#>             node numRecursiveRevDeps coverageRatio totalLines
#>  1:        month                  81             1          1
#>  2:           tz                  79             1          1
#>  3: reclass_date                  68             1          1
#>  4:         date                  67             1          1
#>  5:      is.Date                  60             1          1
#>  6:    is.POSIXt                  57             1          1
#>  7:         wday                  56             1          1
#>  8:   is.POSIXct                  55             1          1
#>  9:  .deprecated                  55             0         10
#> 10:      as_date                  52             1          1

Inspecting results such as these can help an R package developer decide which function to cover with unit tests next.

In this case, check_duration, one of the most depended-on functions (either directly or indirectly), is not covered by unit tests. However, it appears to be a simple one line function that may not be necessary to cover in unit testing. check_interval, on the other hand, might benefit from some unit test coverage as it is a larger, uncovered function with a similar number of dependencies.

Discovering Similar Functions

Looking at that same large package, let's say we want to explore options for consolidating functions. One approach might be to explore consolidating functions that share the same dependencies. In that case, we could use the igraph object to highlight functions with the same out-neighborhood via Jaccard similarity.

# Get igraph object
funcGraph <- report2$FunctionReporter$pkg_graph$igraph
funcNames <- igraph::vertex_attr(funcGraph, name = "name")

# Jaccard Similarity
sim <- igraph::similarity(graph = funcGraph
                          , mode = "out"
                          , method = "jaccard")
diag(sim) <- 0
sim[sim < 1] <- 0

simGraph <- igraph::graph_from_adjacency_matrix(adjmatrix = sim, mode = "undirected")

# Find groups with same out-neighbors (similarity == 1)
sameDeps <- igraph::max_cliques(graph = simGraph
                                , min = 2
                                )

# Write results
for (i in seq_along(sameDeps)) {
    cat(paste0("Group ", i, ": "))
    cat(paste(funcNames[as.numeric(sameDeps[[i]])], collapse = ", "))
    cat("\n")
}
cat("Group 1: divisible_period, make_date
Group 2: parse_date_time2, fast_strptime
Group 3: .deprecated_fun, .deprecated_arg
Group 4: stamp_date, stamp_time
Group 5: epiweek, isoweek
Group 6: ms, hm
Group 7: quarter, semester
Group 8: am, .roll_hms
Group 9: modulo_interval_by_duration, modulo_interval_by_period
Group 10: .difftime_from_pieces, .duration_from_units
Group 11: divide_period_by_period, xtfrm.Period
Group 12: int_diff, %--%
Group 13: isoyear, epiyear
Group 14: nanoseconds, microseconds, picoseconds, milliseconds
Group 15: period_to_seconds, check_period, multiply_period_by_number, format.Period, divide_period_by_number, add_period_to_period
Group 16: myd, dmy, yq, ymd, dym, mdy, ydm
Group 17: hours, weeks, minutes, years, days, months.numeric, seconds, seconds_to_period
Group 18: C_force_tz, hour.default, mday.default, c.POSIXct, .mklt, yday.default, year.default, minute.default, second.default
Group 19: ehours, emilliseconds, eyears, eseconds, epicoseconds, enanoseconds, eminutes, olson_time_zones, edays, emicroseconds, eweeks
Group 20: dmy_h, ydm_hms, ymd_hms, dmy_hm, ymd_h, ydm_hm, ydm_h, dmy_hms, ymd_hm, mdy_hms, mdy_hm, mdy_h"
)

Now, we have identified twenty different groups of functions within lubridate that share the exact same dependencies. We could explore each group of functions for potential consolidation.

utils::remove.packages(
    pkgs = c('baseballstats', 'sartre', 'pkgnet')
    , lib = testLibPath
)

# Just in case 
.libPaths(new = c(origLibPaths))
unlink(testLibPath)

More Information

Want to know even more about the pkgnet package?!

Run pkgnet on itself!

install.packages("pkgnet")
pkgnetObj <- CreatePackageReport("pkgnet", c(DefaultReporters(), InheritanceReporter$new()))

Want to see pkgnet reports for other packages?

Check out the pkgnet Gallery.

Want to ship a pkgnet report with your R package?

Include it a vignette() in your package. See Publishing Your pkgnet Package Report.



UptakeOpenSource/pkgnet documentation built on May 5, 2024, 9:49 p.m.