README.md

Analyzing Dependency Heaviness of R Packages

R-CMD-check CRAN

When developing R packages, we should try to avoid directly setting dependencies on "heavy packages". The "heaviness" for a package means, the number of additional dependency packages it brings to. If your package directly depends on a heavy package, it would bring several consequences:

  1. Users need to install a lot of additional packages when installing your package which brings the risk that installation of some packages may fail and it makes your package cannot be installed.
  2. The namespaces that are loaded into your R session after loading your package will be huge (you can see the loaded namespaces by sessionInfo()).
  3. You package will be "heavy" as well and it may take long time to load your package.

In the DESCRIPTION file of your package, there are "direct dependency pakcages" listed in the Depends, Imports and LinkingTo fields. There are also "indirect dependency packages" that can be found recursively for each of the direct dependency packages. Here what we called "dependency packages" are the union of the direct and indirect dependency packages.

There are also packages listed in Suggests and Enhances fields in DESCRIPTION file, but they are not enforced to be installed when installing your package. Of course, they also have "indirect dependency packages". To get rid of the heavy packages that are not often used in your package, it is better to move them into the Suggests/Enhances fields and to load/install them only when they are needed.

Here the pkgndep package checks the heaviness of the dependency packages of your package. For each package listed in the Depends, Imports, LinkingTo and Suggests/Enhances fields in the DESCRIPTION file, pkgndep checks how many additional packages your package requires. The summary of the dependency is visualized by a customized heatmap.

As an example, I am developing a package called cola which depends on a lot of other packages. The dependency heatmap looks like follows:

image

In the heatmap, rows are the packages listed in Depends, Imports and Suggests fields, columns are the additional dependency packages required for each row package. The barplots on the right show the number of required package, the number of imported functions/methods/classes (parsed from NAMESPACE file) and the quantitative measure "heaviness" (the definition of heaviness will be introduced later).

We can see if all the packages are put in the Depends or Imports field (i.e. movig all suggsted packages to Imports), in total 248 packages are required, which are really a lot. Actually some of the heavy packages such as WGCNA, clusterProfiler and ReactomePA (the last three packages in the heatmap rows) are not very frequently used in cola, moving them to Suggests field and using them only when they are needed greatly helps to reduce the heaviness of cola. Now the number of required packages are reduced to only 64.

Citation

Gu Z. et al., pkgndep: a tool for analyzing dependency heaviness of R packages. Bioinformatics 2022. https://doi.org/10.1093/bioinformatics/btac449

Installation

Prior to installing this package, you'll need to install the Bioconductor package ComplexHeatmap by

BiocManager::install("ComplexHeatmap")

The pkgndep package can be installed from CRAN by

install.packages("pkgndep")

Usage

To use this package:

library(pkgndep)
pkg = pkgndep("package-name")
plot(pkg)

or

pkg = pkgndep("path-of-the-package")
plot(pkg)

An executable example:

library(pkgndep)
pkg = pkgndep("ComplexHeatmap")
pkg
## ComplexHeatmap, version 2.9.4
## 30 additional packages are required for installing 'ComplexHeatmap'
## 117 additional packages are required if installing packages listed in all fields in DESCRIPTION
plot(pkg)

image

License

MIT @ Zuguang Gu



jokergoo/pkgndep documentation built on Aug. 22, 2022, 2:57 a.m.