The CodeDepends package provides a flexible framework for statically analyzing R code (i.e., without evaluating it). It also contains higher-level functionality for: detecting dependencies between R code blocks or expressions, "tree-shaking" (pruning a script down to only the expressions necessary to evaluate a given expression), plotting variable usage timelines, and more.
The primary functions to perform basic code analysis are readScript
which reads in R scripts of various forms (including .R and .Rmd
files), and getInputs
which performs the low-level code-analysis.
The readScript
function returns a Script
object (essentially a
list of ScriptNodes
representing the top-level expressions in the
script). This can then be passed to the getInputs
which, in that
case, returns a ScriptInfo
object, which can be thought of as a list
of ScriptNodeInfo
objects representing information about those
top-level expressions.
R expressions can also be passed directly to getInputs
, which
returns a single ScriptNodeInfo
object in that case. While in
practice users will generally call getInputs
on entire scripts,
passing expressions directly is useful for testing and illustration.
As stated above, ScriptNodeInfo
objects are the units of information
about single expressions being analyzed, and collect various
information extracted from examining the expression itself:
library(CodeDepends) getInputs(quote(x <- y + rnorm(10, sd = z)))
As we can see, the information includes the any string literals used
in the expression, split into file and non-file strings based on
whether the string appears to point to an existing path at analysis
time with respect to the basedir
argument (which defaults to the
current directory). It also contains any libraries loaded by the code
(via library
, require
, or requireNamespace
calls).
Next is are the inputs and outputs of the expression, which are the
variables used by the expression and created by the expression (via
assignment), respectively. By default, these lists will not include
symbols used in ways that mean they are non-standardly evaluated
(e.g., within the construction of a ggplot2
plot object). These
non-standard evaluation variables are collected separately (as
nsevalVars).
Variables whose values are updated (ie ones who are assigned new values which depend on their existing value) are collected separately. These updates can take a large number of forms, including:
x = x + 5 rownames(x) = 5 x[1:3] = 5 x = lapply(1:5, function(i) x[i]^2) x$y = 5
In all of the above cases, the variable x
will be listed in both the
updated
and inputs
categories, but NOT in the outputs
category.
Next are the functions which were called by the expression. These
include those invoked as funtionals, e.g. via the apply
family or
mutate_*
and summarize_*
families. We note here that the functions
list is actually a logical
vector, indicating whether the function
was locally defined within the script (TRUE
), defined within a
package (FALSE
), or unkown (NA
). The names of the vector indicate
the names of the functions. Currently, functions will always be
unknown if a single expression is analyzed directly. Function
provenance detection is only applied to full scripts.
Finally, the list of removed variables, side-effects CodeDepends
is
able to detect, and a copy of the code complete the list of
information extracted.
Symbols within formulas are treated specially when analyzing code, based on the formulaInputs
argument to getInputs
. If FALSE
(the default), they are assumed to evaluated nonstandardly (e.g., in the context of a data.frame
), if TRUE
, they are counted as standard inputs. Currently there is no capacity for mixing these behaviors within a single call to getInputs
.
The getInputs
function accepts a collector
argument, which
essentially specifies a state tracker to be used when walking the code
to collect inputs, functions called, etc.
For largely historical reasons, input collectors are roughly defined as the output from the inputCollector
constructor, rather than as a more formal class.
When creating an input collector, various behavior can be customized,
primarily in the form of \function handlers\ which specify behavior
when analyzing calls to specific functions. This is, for example, how
CodeDepends
knows that some arguments within certain functions are
non-standardly evaluated. CodeDepends ships with a robust set of
default handlers, but these can be overridden or supplemented with
custom handlers by specifying them when constructing the collector,
either via the ...
arguments or as list. In both cases, the names
are the names of the function the handler should be used on.
col = inputCollector(library = function(e, collector, ...) { print(paste("Hello", asVarName(e))) defaultFuncHandlers$library(e, collector, ...) }) getInputs(quote(library(CodeDepends)), collector = col)
inputCollector
also accepts arguments which control what is counted
as an input when processing expressions. The inclPrevOutput
argument
specifies whether output variables should be included as inputs to
subsequent expressions when processing multiple expressions as an
single block (e.g., when they are wrapped in {}
). The
checkLibrarySymbols
and funcsAsInputs
arguments control how
symbols which appear to be resolved within libraries, and functions
which are called are handled, respectively. The default behavior is
for all of these to be FALSE
.
CodeDepends
can visualize code in various ways.
We can create the variable graph of dependnecies between variables,
via the makeVariableGraph
function:
f = system.file("samples", "results-multi.R", package = "CodeDepends") sc = readScript(f) g = makeVariableGraph( info = getInputs(sc)) if(require(Rgraphviz)) plot(g)
We can also create call graphs for functions or entire packages:
gg = makeCallGraph("package:CodeDepends") if(require(Rgraphviz)) { gg = layoutGraph(gg, layoutType = "circo") graph.par(list(nodes = list(fontsize=55))) renderGraph(gg) ## could also call plot directly }
Finally we can display timelines for when variables are defined, redefined, and used:
f = system.file("samples", "results-multi.R", package = "CodeDepends") sc = readScript(f) dtm = getDetailedTimelines(sc, getInputs(sc)) plot(dtm) # A big/long function info = getInputs(arima0) dtm = getDetailedTimelines(info = info) plot(dtm, var.cex = .7, mar = 4, srt = 30)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.