if (identical (Sys.getenv ("IN_PKGDOWN"), "true")) { pkgstats::ctags_install () }
Extract summary statistics of R package structure and functionality. Not all
statistics of course, but a good go at balancing insightful statistics while
ensuring computational feasibility. pkgstats
is a static code analysis
tool, so is generally very fast (a few seconds at most for very large
packages). Installation is described in a separate
vignette.
Statistics are derived from these primary sources:
DESCRIPTION
file and related package meta-statistics./R
, ./src
, and
./inst/include
).ctags
, and references
("calls") to those obtained from another tagging library,
gtags
. This network roughly
connects every object making a call (as from
) with every object being
called (to
).The primary function,
pkgstats()
,
returns a list of these various components, including full data.frame
objects
for the final three components described above. The statistical properties of
this list can be aggregated by the pkgstats_summary()
function,
which returns a data.frame
with a single row of summary statistics. This
function is demonstrated below, including full details of all statistics
extracted.
The following code demonstrates the output of the main function, pkgstats
,
using an internally bundled .tar.gz
"tarball" of this package. The
system.time
call demonstrates that the static code analyses of pkgstats
are
generally very fast.
library (pkgstats) tarball <- system.file ("extdata", "pkgstats_9.9.tar.gz", package = "pkgstats") system.time ( p <- pkgstats (tarball) ) names (p)
The result is a list of various data extracted from the code. All except for
objects
and network
represent summary data:
p [!names (p) %in% c ("objects", "network", "external_calls")]
The various components of these results are described in further detail in the main package vignette.
pkgstats_summary()
functionA summary of the pkgstats
data can be obtained by submitting the object
returned from pkgstats()
to the pkgstats_summary()
function:
s <- pkgstats_summary (p)
This function reduces the result of the pkgstats()
function
to a single line with r ncol (s)
entries, represented as a data.frame
with
one row and that number of columns. This format is intended to enable summary
statistics from multiple packages to be aggregated by simply binding rows
together. While r ncol (s)
statistics might seem like a lot, the
pkgstats_summary()
function
aims to return as many usable raw statistics as possible in order to flexibly
allow higher-level statistics to be derived through combination and
aggregation. These r ncol (s)
statistics can be roughly grouped into the
following categories (not shown in the order in which they actually appear),
with variable names in parentheses after each description. Some statistics are
summarised as comma-delimited character strings, such as translations into
human languages, or other packages listed under "depends", "imports", or
"suggests". This enables subsequent analyses of their contents, for example of
actual translated languages, or both aggregate numbers and individual details
of all package dependencies, as demonstrated immediately below.
Package Summaries
package
)version
)DESCRIPTION
file where not explicitly
stated (date
)license
)languages
), and
excluding R
itself.translations
).Information from DESCRIPTION
file
url
)bugs
)desc_n_aut
), contributor
(desc_n_ctb
), funder (desc_n_fnd
), reviewer (desc_n_rev
), thesis
advisor (ths
), and translator (trl
, relating to translation between
computer and not spoken languages).depends
, imports
, suggests
,
and linking_to
packages.Numbers of entries in each the of the last two kinds of items can be obtained
from by a simple strsplit
call, like this:
deps <- strsplit (s$suggests, ", ") [[1]] length (deps) print (deps)
Numbers of files and associated data
num_vignettes
)num_demos
)num_data_files
)data_size_total
)data_size_median
)files_R
, files_src
,
files_inst
, files_vignettes
, files_tests
), where numbers are
recursively counted in all sub-directories, and where inst
only counts
files in the inst/include
sub-directory.Statistics on lines of code
loc_R
, loc_src
, loc_ins
,
loc_vignettes
, loc_tests
).blank_lines_R
,
blank_lines_src
, blank_lines_inst
, blank_lines_vignette
,
blank_lines_tests
).comment_lines_R
,
comment_lines_src
, comment_lines_inst
, comment_lines_vignettes
,
comment_lines_tests
).rel_space_R
,
rel_space_src
, rel_space_inst
, rel_space_vignettes
, rel_space_tests
),
as well as an overall measure for the R/
, src/
, and inst/
directories
(rel_space
).indentation
), with values of -1
indicating indentation with tab characters.nexpr
).Statistics on individual objects (including functions)
These statistics all refer to "functions", but actually represent more general "objects," such as global variables or class definitions (generally from languages other than R), as detailed below.
n_fns_r
)n_fns_r_exported
,
n_fns_r_not_exported
)n_fns_src
),
including functions in both src
and inst/include
directories.src
) directories (n_fns_per_file_r
, n_fns_per_file_src
).npars_exported_mn
, npars_exported_md
).loc_per_fn_r_mn
, loc_per_fn_r_md
, loc_per_fn_r_exp_m
,
loc_per_fn_r_exp_md
, loc_per_fn_r_not_exp_mn
, loc_per_fn_r_not_exp_m
,
loc_per_fn_src_mn
, loc_per_fn_src_md
).doclines_per_fn_exp_mn
, doclines_per_fn_exp_md
,
doclines_per_fn_not_exp_m
, doclines_per_fn_not_exp_md
,
docchars_per_par_exp_mn
, docchars_per_par_exp_m
).Network Statistics
The full structure of the network
table is described below, with summary
statistics including:
n_edges
,
n_edges_r
, n_edges_src
).n_clusters
).centrality_dir_mn
,
centrality_dir_md
, centrality_undir_mn
, centrality_undir_md
).centrality_dir_mn_no0
, centrality_dir_md_no0
, centrality_undir_mn_no0
,
centrality_undir_md_no
).num_terminal_edges_dir
,
num_terminal_edges_undir
).node_degree_mn
, node_degree_md
,
node_degree_max
)External Call Statistics
The final column in the result of the pkgstats_summary()
function
summarises the external_calls
object detailing all calls make to external
packages (including to base and recommended packages). This summary is
also represented as a single character string. Each package lists total numbers
of function calls, and total numbers of unique function calls. Data for each
package are separated by a comma, while data within each package are separated
by a colon.
s$external_calls
This structure allows numbers of calls to all packages to be readily extracted with code like the following:
calls <- do.call ( rbind, strsplit (strsplit (s$external_call, ",") [[1]], ":") ) calls <- data.frame ( package = calls [, 1], n_total = as.integer (calls [, 2]), n_unique = as.integer (calls [, 3]) ) print (calls)
The two numeric columns respectively show the total number of calls made to each package, and the total number of unique functions used within those packages. These results provide detailed information on numbers of calls made to, and functions used from, other R packages, including base and recommended packages.
Finally, the summary statistics conclude with two further statistics of
afferent_pkg
and efferent_pkg
. These are package-internal measures of
afferent and efferent
couplings between the
files of a package. The afferent couplings (ca
) are numbers of incoming
calls to each file of a package from functions defined elsewhere in the
package, while the efferent couplings (ce
) are numbers of outgoing calls
from each file of a package to functions defined elsewhere in the package.
These can be used to derive a measure of "internal package instability" as the
ratio of efferent to total coupling (ce / (ce + ca)
).
There are many other "raw" statistics returned by the main pkgstats()
function which are not represented in pkgstats_summary()
. The main package
vignette provides
further detail on the full results.
The following sub-sections provide further detail on the objects
, network
,
and external_call
items, which could be used to extract additional statistics
beyond those described here.
Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
All contributions to this project are gratefully acknowledged using the allcontributors
package following the all-contributors specification. Contributions of any kind are welcome!
mpadge |
jeroen |
Bisaloo |
thomaszwagerman |
helske |
rpodcast |
assignUser |
GFabien |
pawelru |
stitam |
willgearty |
krlmlr |
noamross |
maelle |
schneiderpy |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.