library(pkgndep) pkgndep:::load_all_pkg_dep()
When developing R packages, we should try to avoid directly setting dependencies on "heavy packages". The "heaviness" for a package means, the number of additional dependency packages it brings to. If your package directly depends on a heavy package, it would bring several consequences:
sessionInfo()
).In the DESCRIPTION file of your package, there are "direct dependency
pakcages" listed in the Depends
, Imports
and LinkingTo
fields. There are
also "indirect dependency packages" that can be found recursively for each of
the direct dependency packages. Here what we called "dependency packages" are
the union of the direct and indirect dependency packages.
There are also packages listed in Suggests
and Enhances
fields in
DESCRIPTION file, but they are not enforced to be installed when installing
your package. Of course, they also have "indirect dependency packages". To get
rid of the heavy packages that are not often used in your package, it is
better to move them into the Suggests
/Enhances
fields and to load/install
them only when they are needed.
Here the pkgndep package checks the heaviness of the dependency packages
of your package. For each package listed in the Depends
, Imports
,
LinkingTo
and Suggests
/Enhances
fields in the DESCRIPTION file,
pkgndep checks how many additional packages your package requires. The
summary of the dependency is visualized by a customized heatmap.
As an example, I am developing a package called cola which depends on a lot of other packages. The dependency heatmap looks like follows (please drag the figure to a new tab to see it in its actual size):
x = pkgndep:::ENV$all_pkg_dep[["cola"]] pdf(NULL) size = dependency_heatmap(x, help = FALSE) invisible(dev.off())
dependency_heatmap(x)
In the heatmap, rows are the packages listed in Depends
, Imports
and
Suggests
fields, columns are the additional dependency packages required for
each row package. The barplots on the right show the number of required
package, the number of imported functions/methods/classes (parsed from
NAMESPACE file) and the quantitative measure "heaviness" (the definition of
heaviness will be introduced later).
We can see if all the packages are put in the Depends
or Imports
field
(i.e. movig all suggsted packages to Imports
), in total r x$n_by_all
packages are required, which are really a lot. Actually some of the heavy
packages such as WGCNA, clusterProfiler and ReactomePA (the last
three packages in the heatmap rows) are not very frequently used in cola,
moving them to Suggests
field and using them only when they are needed
greatly helps to reduce the dependencies of cola. Now the number of required
packages are reduced to only r x$n_by_strong
.
To use this package:
library(pkgndep) pkg = pkgndep("package-name") # if the package is already installed dependency_heatmap(pkg)
or
pkg = pkgndep("path-of-the-package") # if the package has not been installed yet dependency_heatmap(pkg)
The value for pkgndep()
should be 1. a CRAN/Bioconductor package, 2. an installed package, 3. a path of a local package, 4. URL of a GitHub repository.
Executable examples:
if(grepl("devel", R.version$status)) { pkgndep = function(...) { pkgndep::pkgndep(..., online = FALSE) } }
library(pkgndep) pkg = pkgndep("ComplexHeatmap") pkg
pkgndep()
first needs to retrieve package databases both from remote repositories and local libraries, as you
can see the message from above code. This only happens once and the database is internally
saved and re-used.
We can directly use dependency_heatmap()
function to create the dependency heatmap:
pdf(NULL) size = dependency_heatmap(pkg, help = FALSE) invisible(dev.off())
dependency_heatmap(pkg)
You can set the file
argument to directly save the image into a figure where the figure
size is automatically calculated. Supported image formats are png
/jpg
/svg
/pdf
.
dependency_heatmap(pkg, file = "test.png")
heaviness_report()
function can generate an HTML report for the dependency heaviness analysis on the package.
heaviness_report(pkg)
The heaviness of package dependency can be measured quantitatively. pkgndep provides two measures: the absolute measure and the relative measure.
The heaviness of a dependency package is calculated as follows. If package B
is in the Depends
/Imports
/LinkingTo
fields of package A, which means, package B
is directly required for package A, denote v1
as the total number of packages for package A,
and denote v2
as the total number of required packages if moving package
B to Suggests
in package A (which means, now B is not enforced to be installed for package A). The
absolute measure of heaviness is simply v1 - v2
and relative measure is (v1 + a)/(v2 + a)
where a
is a small constant, e.g. 10.
So here the absolute heaviness for package B on package A is the number of additional packages
that package B uniquely brings in.
In the second scenario, if package B is in the Suggests
/Enhances
fields of package
A, now v2
is the total number of required packages if moving package B to Imports
in package A,
the absolute measure of heaviness is v2 - v1
and relative measure is (v2 + a)/(v1 + a)
.
The heaviness score can be calculated by the function heaviness()
:
heaviness(pkg) heaviness(pkg, rel = TRUE)
tools::package_dependencies()
The package dependencies are based on "package database" which is normally retrieved by available.packages()
.
In tools package, there is a package_dependencies()
function that can be used to get a list of dependency packages.
In the following example code, we retrieve the dependency packages for package ggplot2.
chooseCRANmirror(ind = 1) # choose the mirror fro RStudio db = available.packages()
system.time(p1 <- tools::package_dependencies("ggplot2", db = db, recursive = TRUE)[[1]])
In pkgndep, we implement a faster version of package_dependencies()
function. First the database needs
to be reformatted by reformat_db()
function. The returned variable db2
is a reference class object and
its method db2$package_dependencies()
can be used to retrieve dependency packages.
db2 = reformat_db(db) db2 system.time(p2 <- db2$package_dependencies("ggplot2", recursive = TRUE, simplify = TRUE))
p1
and p2
are actually identical:
identical(sort(p1), sort(p2))
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.