Check the Heaviness of Package Dependencies

When developing R packages, we should try to avoid directly setting dependencies to "heavy packages". The "heaviness" for a package means, the number of additional dependent packages it brings to. If your package directly depends on a heavy package, it brings several consequences:

  1. Users need to install a lot of additional packages if your package is installed (which brings the risk that installation of some packages may fail that makes your package cannot be installed neither).
  2. The namespaces that are loaded into your R session after loading your package (by library(your-pkg)) will be huge (you can see the loaded namespaces by sessionInfo()).

You package will be "heavy" as well and it may take long time to load your package.

In the DESCRIPTION file of your package, those "directly dependent pakcages" are always listed in the "Depends" or "Imports" fields. To get rid of the heavy packages that are not offen used in your package, it is better to move them into the "Suggests" fields and load them only when they are needed.

Here pkgndep package checks the heaviness of the packages that your package depends on. For each package listed in the "Depends", "Imports" and "Suggests" fields in the DESCRIPTION file, it opens a new R session, loads the package and counts the number of namespaces that are loaded. The summary of the dependencies is visualized by a customized heatmap.

As an example, I am developing a package cola which depends on a lot of other packages. The dependency heatmap looks like (Figure in the original size is here):

In the heatmap, rows are the packages listed in "Depends", "Imports" and "Suggests" fields, columns are the namespaces that are loaded if each of the package is only loaded to a new R session. The barplots on the right show the number of namespaces that are imported by each package and the time of only loading one of the packages into R.

We can see if all the packages are put in the "Imports" field, 166 namespaces will be loaded after library(cola). Some of the heavy packages such as WGCNA and clusterProfiler are not very frequently used in cola, moving them to "Suggests" field and loading them only when they are needed helps to speed up loading cola. Now the number of namespaces are reduced to only 25 after library(cola).

Usage

To use this package:

library(pkgndep)
x = pkgndep("package-name")
plot(x)

or

x = pkgndep("path-to-the-package")
plot(x)

Executable examples:

library(pkgndep)
x = pkgndep("ComplexHeatmap")
x
pdf(NULL)
size = plot(x)
invisible(dev.off())
width = as.numeric(size$width)
height = as.numeric(size$height)
plot(x)

By default, the heatmap has a fixed size, thus, when there are too many dependencies, the heatmap might be out of the figure area. In this case, fix_size = FALSE can be set in plot() so that the heatmap is adjusted to fit the figure area.

plot(x, fix_size = FALSE)

If fix_size is set to TRUE (which is the default), The size of the whole heatmap can be obtained by:

size = plot(x)  

where size is a unit object with the width and height of the whole heatmap. If you want to save the plot in to e.g. a PDF file that has the same size of the heatmap, you need to make the plot twice. The first one for saving the plot into a null graphics device, just to obtain the size of the plot:

pdf(NULL) # a null device
size = plot(x)
dev.off()

width = as.numeric(size[1])
height = as.numeric(size[2])

pdf(..., width = width, height = height)
plot(x)
dev.off()

Statistics

I ran pkgndep on all packages that are installed in my computer. The table of the number of loaded namespaces as well as the dependency heatmaps are available at https://jokergoo.github.io/pkgndep/stat/.

For a quick look, the top 10 packages with the largest dependencies are:

|Package | # Namespaces| also load packages in Suggests| Heatmap| |:---------------------------------------------|------------:|------------------------------:|-------------------------------------------------------------------------------------------------------:| |ReportingTools | 125| 131| view| |TCGAbiolinks | 118| 209| view| |epik | 116| 116| view| |minfiData | 109| 109| view| |minfiDataEPIC | 109| 109| view| |ggbio | 108| 119| view| |FlowSorted.Blood.450k | 108| 108| view| |IlluminaHumanMethylation450kanno.ilmn12.hg19 | 108| 108| view| |IlluminaHumanMethylation450kmanifest | 108| 108| view| |IlluminaHumanMethylationEPICanno.ilm10b2.hg19 | 108| 108| view|

And the top 10 packages with the largest dependencies where packages in "Suggests" are also loaded are:

|Package | # Namespaces| also load packages in Suggests| Heatmap| |:--------------|------------:|------------------------------:|------------------------------------------------------------------------:| |TCGAbiolinks | 118| 209| view| |cola | 25| 174| view| |broom | 29| 171| view| |GSEABase | 29| 135| view| |sesame | 73| 134| view| |ReportingTools | 125| 131| view| |GenomicRanges | 17| 128| view| |ensembldb | 57| 126| view| |AER | 36| 126| view| |BiocGenerics | 8| 125| view|



Try the pkgndep package in your browser

Any scripts or data that you put into this service are public.

pkgndep documentation built on March 5, 2021, 5:06 p.m.