knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

crandep

The goal of crandep is to provide functions for analysing the dependencies of CRAN packages using social network analysis.

Installation

You can install crandep from github with:

# install.packages("devtools")
devtools::install_github("clement-lee/crandep")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(crandep)
library(dplyr)
library(ggplot2)
library(igraph)

Overview

The functions and example dataset can be divided into the following categories:

  1. For obtaining data frames of package dependencies, use get_dep(), get_dep_all_packages().
  2. For obtaining igraph objects of package dependencies, use get_graph_all_packages() and df_to_graph().
  3. For modelling the number of dependencies, use *pol() and *mix2().
  4. There is also an example data set cran_dependencies.

One or multiple types of dependencies

To obtain the information about various kinds of dependencies of a package, we can use the function get_dep() which takes the package name and the type of dependencies as the first and second arguments, respectively. Currently, the second argument accepts a character vector of one or more of the following words: Depends, Imports, LinkingTo, Suggests, Enhances, Reverse_depends, Reverse_imports, Reverse_linking_to, Reverse_suggests, and Reverse_enhances, or any variations in their letter cases, or if the underscore "_" is replaced by a space.

get_dep("dplyr", "Imports")
get_dep("MASS", c("depends", "suggests"))

For more information on different types of dependencies, see the official guidelines and https://r-pkgs.org/description.html.

In the output, the column type is the type of the dependency converted to lower case. Also, LinkingTo is now converted to linking to for consistency.

get_dep("xts", "LinkingTo")
get_dep("xts", "linking to")

For the reverse dependencies, the substring "reverse_" will not be shown in type; instead the reverse column will be TRUE. This can be illustrated by the following:

get_dep("abc", c("depends", "reverse_depends"))
get_dep("xts", c("linking to", "reverse linking to"))

Theoretically, for each forward dependency

data.frame(from = "A", to = "B", type = "c", reverse = FALSE)

there should be an equivalent reverse dependency

data.frame(from = "B", to = "A", type = "c", reverse = TRUE)

Aligning the type in the forward dependency and the reverse dependency enables this to be checked easily.

To obtain all types of dependencies, we can use "all" in the second argument, instead of typing a character vector of all words:

df0.abc <- get_dep("abc", "all")
df0.abc
df0.rstan <- get_dep("rstan", "all") # too many rows to display
dplyr::count(df0.rstan, type, reverse) # hence the summary using count()
df0.all <- get_dep_all_packages()
df1.all <- df0.all |> dplyr::count(from, type, reverse) |> dplyr::count(from)
v9.all <- dplyr::filter(df1.all, n == 9L)$from
v0.all <- dplyr::filter(df1.all, n == 10L)$from

As of r Sys.Date(), there are r length(v0.all) packages that have all 10 types of dependencies, and r length(v9.all) packages that have 9 types of dependencies: r paste(v9.all, collapse = ", ").

Building and visualising a dependency network

To build a dependency network, we have to obtain the dependencies for multiple packages. For illustration, we choose the core packages of the tidyverse, and find out what each package Imports. We put all the dependencies into one data frame, in which the package in the from column imports the package in the to column. This is essentially the edge list of the dependency network.

df0.imports <- rbind(
  get_dep("ggplot2", "Imports"),
  get_dep("dplyr", "Imports"),
  get_dep("tidyr", "Imports"),
  get_dep("readr", "Imports"),
  get_dep("purrr", "Imports"),
  get_dep("tibble", "Imports"),
  get_dep("stringr", "Imports"),
  get_dep("forcats", "Imports")
)
head(df0.imports)
tail(df0.imports)

All types of dependencies, in a data frame

The example dataset cran_dependencies contains all dependencies as of 2020-05-09.

data(cran_dependencies)
cran_dependencies
dplyr::count(cran_dependencies, type, reverse)

This is essentially a snapshot of CRAN. We can obtain all the current dependencies using get_dep_all_packages(), which requires no arguments:

df0.cran <- get_dep_all_packages()
head(df0.cran)
dplyr::count(df0.cran, type, reverse) # numbers in general larger than above

Network of one type of dependencies, as an igraph object

We can build dependency network using get_graph_all_packages(). Furthermore, we can verify that the forward and reverse dependency networks are (almost) the same, by looking at their size (number of edges) and order (number of nodes).

g0.depends <- get_graph_all_packages(type = "depends")
g0.depends

We could obtain essentially the same graph, but with the direction of the edges reversed, by specifying type = "reverse depends":

# Not run
g0.rev_depends <- get_graph_all_packages(type = "reverse depends")
g0.rev_depends

The dependency words accepted by the argument type is the same as in get_dep(). The two networks' size and order should be very close if not identical to each other. Because of the dependency direction, their edge lists should be the same but with the column names from and to swapped.

For verification, the exact same graphs can be obtained by filtering the data frame for the required dependency and applying df_to_graph():

g1.depends <- df0.cran |>
  dplyr::filter(type == "depends" & !reverse) |>
  df_to_graph(nodelist = dplyr::rename(df0.cran, name = from))
g1.depends # same as g0.depends


clement-lee/rackage documentation built on March 28, 2024, 7:05 p.m.