This vignette is a reference on abstract syntax trees.
rstatic uses R6 classes to represent individual components of R
expressions. The parent class of all components is ASTNode
, since each
expression is an abstract syntax tree (AST) with the components as nodes.
Abstract syntax trees are structurally similar to the parse trees built into R,
but provide some simplifications to make them more amenable to analysis.
Note that in contrast to most R objects, R6 objects are references. This
means that assigning an R6 object to a new variable will not create a copy,
and changes to the new variable will affect the original object. Fields and
methods on an R6 object are accessed with the $
operator.
The ASTNode
class only has one field and one method. The parent
field holds
the node's in the abstract syntax tree or control flow graph. The copy()
method creates a copy of the node and its children.
Literal numbers, strings, logicals and nulls are represented by subclasses of
Literal
.
In the example below, the literal 6.28
is converted to a code object. The
quote_ast()
function is like to_ast()
, but automatically quotes its argument.
tau = quote_ast(6.28) tau
In this case, the result is a Numeric
object, which is a subclass of
Literal
.
class(tau)
The value
field stores the Literal
's value.
tau$value
Variable names or symbols are represented by the Symbol
class.
The code below creates a new symbol named x
.
x = Symbol$new("x") x
Symbol
s have fields basename
, ssa_number
, and name
. The basename
field is the name of the symbol as it appears in the original R program.
x$basename
The ssa_number
field is the SSA number assigned to this symbol or NA
if
none was assigned.
x$ssa_number
The name
field is read-only and contains the SSA name of the symbol. When
ssa_number
is NA
, name
is the same as the basename
.
x$name
x$ssa_number = 3 x$name
Function parameters are represented by the Parameter
class, a subclass of
Symbol
. In addition to inherited members, Parameter
s have a default
field
for storing the default argument to the parameter.
As an example, the code below creates a parameter called bins
with default
argument 8
.
Parameter$new("bins", Integer$new(8))
Note that the R representation of a lone parameter is a pairlist with one element, which is why the parameter is displayed as a pairlist.
Assignments with =
or <-
are represented by the Assign
class, which has
fields read
and write
. The write
field holds the left-hand side of the
assignment, and the read
field holds the right-hand side.
As an example, consider the code below, which assigns the complex number 3+2i
to the variable z
.
assign = quote_ast(z <- 3+2i)
Here, the write
field holds a symbol and the read
field holds a complex
value.
assign$write
assign$read
The setter methods should be used to set write
and read
, rather than
setting the fields directly. They ensure that children of the Assign
node
have the parent
field set appropriately.
Expressions with arguments are represented by the Application
class. Calls
and return statements are represented by Call
and Return
, which are
subclasses of Application
.
For instance, the snippet below creates a call to sum()
with no arguments.
call_sum = Call$new("sum")
All Application
s have an args
field that holds the list of arguments. For
call_sum
, this is an empty list.
call_sum$args
We can set the sum in the example to have two arguments, the integers 4
and
6
.
call_sum$args = list(Integer$new(4), Integer$new(6)) call_sum
In addition to the members inherited from Application
, Call
s also have a
fn
field that holds the function being called, either as a Symbol
or as a
Function
(for calls to anonymous functions).
In the sum example, the fn
field is a symbol named sum
.
call_sum$fn
Specialized calls such as replacement functions and .Internal()
are
represented by subclasses of Call
. This makes it easy to use S3 dispatch to
treat them like any other Call
or to handle them as special cases.
The Replacement
class represents a call to a replacement function. This class
does not add any new members, but requires that the function name fn
ends with "<-"
.
The Internal
class, which represents a call to .Internal()
, does not add
any new members or functionality. It exists only to distinguish these calls
from others.
Parentheses and curly braces are represented by the Brace
class. This class
has fields body
and is_paren
. The is_paren
field is TRUE
if the Brace
object represents parentheses and FALSE
if the object represents curly
braces.
As an example, the code shown below converts a parenthetical expression to a
Brace
.
paren = quote_ast( (2 + 2) ) paren
The body
field holds the list of expressions inside the Brace
. For
parentheses, this list must have length 0 or 1 to be valid. The list can be any
length for curly braces.
paren$body
Callable objects are represented by the Callable
class. The Callable
class
defines a params
field that holds a list of parameters (and default
arguments) for the callable object.
Function
and Primitive
, both subclasses of Callable
, represent functions
and primitives, respectively. The Function
class adds a body
field to the
members inherited from Callable
. The body
field holds the single expression
inside the function; for functions that contain more than one expresion, body
holds a Brace
object.
For example, the code below converts a function to a Function
object.
to_celsius = function(x) { (x - 32) * 5 / 9 } to_celsius = to_ast(to_celsius) to_celsius
The parameter x
is in the params
field.
to_celsius$params
The code inside the function definition is in the body
field.
to_celsius$body
Since primitives have names but do not contain any R code, the Primitive
class adds a fn
field to the members inherited from Callable
. The fn
field holds the name of the primitive, and is similar to the fn
field on the
Call
class.
Control structures are represented by the Next
, Break
, If
, For
, and
While
classes. Next
and Break
do not add any new functionality to
ASTNode
, and are merely meant for method method dispatch.
The If
class has fields condition
, true
, and false
. These fields hold
to the if-statement's condition, "if" branch, and "else" branch, respectively.
The false
field is NULL
for if-statements that don't have an "else" branch.
For instance, the code below shows the AST for an if-statement.
ast = quote_ast( if (x > 0) { x = x + sqrt(x) }) ast
The condition on the if-statement is x > 0
, and the condition
field
reflects this.
ast$condition
The body of the if-statement is held in the true
field. The false
field is
NULL
since the if-statement does not have an "else" branch.
ast$true
The For
class has fields ivar
, iter
, and body
. The ivar
field holds
the iteration variable, the iter
field holds the object being iterated over,
and the body
field holds the list of expressions that make up the body of the
loop.
The While
class has fields condition
, body
, and is_repeat
. Similar to
the If
class, the condition
field holds the condition for the loop. The
body
field holds the list of expressions that make up the body of the loop.
The is_repeat
field is TRUE
if the loop is a repeat
loop and FALSE
if
the loop is a while
loop. When is_repeat
is TRUE
, the condition
field
is ignored (and defaults to NULL
).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.