This vignette is a reference on abstract syntax trees.

Introduction

rstatic uses R6 classes to represent individual components of R expressions. The parent class of all components is ASTNode, since each expression is an abstract syntax tree (AST) with the components as nodes. Abstract syntax trees are structurally similar to the parse trees built into R, but provide some simplifications to make them more amenable to analysis.

Note that in contrast to most R objects, R6 objects are references. This means that assigning an R6 object to a new variable will not create a copy, and changes to the new variable will affect the original object. Fields and methods on an R6 object are accessed with the $ operator.

The ASTNode class only has one field and one method. The parent field holds the node's in the abstract syntax tree or control flow graph. The copy() method creates a copy of the node and its children.

Literals

Literal numbers, strings, logicals and nulls are represented by subclasses of Literal.

In the example below, the literal 6.28 is converted to a code object. The quote_ast() function is like to_ast(), but automatically quotes its argument.

tau = quote_ast(6.28)

tau

In this case, the result is a Numeric object, which is a subclass of Literal.

class(tau)

The value field stores the Literal's value.

tau$value

Symbols & Parameters

Variable names or symbols are represented by the Symbol class.

The code below creates a new symbol named x.

x = Symbol$new("x")

x

Symbols have fields basename, ssa_number, and name. The basename field is the name of the symbol as it appears in the original R program.

x$basename

The ssa_number field is the SSA number assigned to this symbol or NA if none was assigned.

x$ssa_number

The name field is read-only and contains the SSA name of the symbol. When ssa_number is NA, name is the same as the basename.

x$name
x$ssa_number = 3

x$name

Function parameters are represented by the Parameter class, a subclass of Symbol. In addition to inherited members, Parameters have a default field for storing the default argument to the parameter.

As an example, the code below creates a parameter called bins with default argument 8.

Parameter$new("bins", Integer$new(8))

Note that the R representation of a lone parameter is a pairlist with one element, which is why the parameter is displayed as a pairlist.

Assignments

Assignments with = or <- are represented by the Assign class, which has fields read and write. The write field holds the left-hand side of the assignment, and the read field holds the right-hand side.

As an example, consider the code below, which assigns the complex number 3+2i to the variable z.

assign = quote_ast(z <- 3+2i)

Here, the write field holds a symbol and the read field holds a complex value.

assign$write
assign$read

The setter methods should be used to set write and read, rather than setting the fields directly. They ensure that children of the Assign node have the parent field set appropriately.

Function Calls

Expressions with arguments are represented by the Application class. Calls and return statements are represented by Call and Return, which are subclasses of Application.

For instance, the snippet below creates a call to sum() with no arguments.

call_sum = Call$new("sum")

All Applications have an args field that holds the list of arguments. For call_sum, this is an empty list.

call_sum$args

We can set the sum in the example to have two arguments, the integers 4 and 6.

call_sum$args = list(Integer$new(4), Integer$new(6))

call_sum

In addition to the members inherited from Application, Calls also have a fn field that holds the function being called, either as a Symbol or as a Function (for calls to anonymous functions).

In the sum example, the fn field is a symbol named sum.

call_sum$fn

Specialized calls such as replacement functions and .Internal() are represented by subclasses of Call. This makes it easy to use S3 dispatch to treat them like any other Call or to handle them as special cases.

The Replacement class represents a call to a replacement function. This class does not add any new members, but requires that the function name fn ends with "<-".

The Internal class, which represents a call to .Internal(), does not add any new members or functionality. It exists only to distinguish these calls from others.

Braces

Parentheses and curly braces are represented by the Brace class. This class has fields body and is_paren. The is_paren field is TRUE if the Brace object represents parentheses and FALSE if the object represents curly braces.

As an example, the code shown below converts a parenthetical expression to a Brace.

paren = quote_ast( (2 + 2) )

paren

The body field holds the list of expressions inside the Brace. For parentheses, this list must have length 0 or 1 to be valid. The list can be any length for curly braces.

paren$body

Functions & Primitives

Callable objects are represented by the Callable class. The Callable class defines a params field that holds a list of parameters (and default arguments) for the callable object.

Function and Primitive, both subclasses of Callable, represent functions and primitives, respectively. The Function class adds a body field to the members inherited from Callable. The body field holds the single expression inside the function; for functions that contain more than one expresion, body holds a Brace object.

For example, the code below converts a function to a Function object.

to_celsius = function(x) {
  (x - 32) * 5 / 9
}

to_celsius = to_ast(to_celsius)

to_celsius

The parameter x is in the params field.

to_celsius$params

The code inside the function definition is in the body field.

to_celsius$body

Since primitives have names but do not contain any R code, the Primitive class adds a fn field to the members inherited from Callable. The fn field holds the name of the primitive, and is similar to the fn field on the Call class.

Control Structures

Control structures are represented by the Next, Break, If, For, and While classes. Next and Break do not add any new functionality to ASTNode, and are merely meant for method method dispatch.

The If class has fields condition, true, and false. These fields hold to the if-statement's condition, "if" branch, and "else" branch, respectively. The false field is NULL for if-statements that don't have an "else" branch.

For instance, the code below shows the AST for an if-statement.

ast = quote_ast(
  if (x > 0) {
    x = x + sqrt(x)
  })

ast

The condition on the if-statement is x > 0, and the condition field reflects this.

ast$condition

The body of the if-statement is held in the true field. The false field is NULL since the if-statement does not have an "else" branch.

ast$true

The For class has fields ivar, iter, and body. The ivar field holds the iteration variable, the iter field holds the object being iterated over, and the body field holds the list of expressions that make up the body of the loop.

The While class has fields condition, body, and is_repeat. Similar to the If class, the condition field holds the condition for the loop. The body field holds the list of expressions that make up the body of the loop. The is_repeat field is TRUE if the loop is a repeat loop and FALSE if the loop is a while loop. When is_repeat is TRUE, the condition field is ignored (and defaults to NULL).



nick-ulle/rstatic documentation built on Oct. 18, 2019, 4:38 a.m.