inst/doc/rpeg-intro.md

An Introduction to the Pegr Package

Introduction

The pegr package is a package containing a parsing expressing grammer generator, together with some tools for managing and debuging sets of rules. Pegr is a recursive decent parser, using memoizaton to improve efficiency. The Pegr grammar is the creation of Brian Ford. This document is divided up into two major sections, the first just describes the grammar, the second gives a walkthrough on usage within pegr

The Grammar

The grammar consists of a set of rules ( a rule is also called a definition) Rules A rule consists of a ruleid, an arrow, and a parsing expresssion. The ruleid is also called a rule name or a non-termial. A rule a has the form $$ RuleId \leftarrow ParsingExpression$$ Parsing Expressions are built from atoms (aka. literals or terminals) rules, and operators. In pegr, each parsing expression returns a result consisting of three items: 1. A status (TRUE for a match, i.e. success, FALSE for no match, i.e. failure ) 2. A position recording what of the original text was consumed. 3. A value, actually a list of one one more values associated with that expression.

The following describes the components of a parsing expression:

Usage Specifics

Parser

A pegr parser is created by a call to new.parser()

library(pegr)
parser <- new.parser()

Rules

parser <- add_rule(parser, "Any<-.")
parser <- add_rule(parser, "A<-'a'")
parser <- add_rule(parser, "B<-'b'")
parser <- add_rule(parser, "C<- A (B / C) ")
parser <- add_rule(parser, "D<- D")  #bad rule: will produce infinite recursion   
rule_ids(parser)
## [1] "A"   "Any" "B"   "C"   "D"
parser <- set_description(parser, "Any", "Accepts any character")
parser <- set_description(parser, "A", "Accepts a")
parser <- set_description(parser, "B", "Accepts b")
parser <- set_description(parser, "C", "Accepts string of a's terminated by a b")
parser <- set_description(parser, "D", "A very bad rule")
parser <- delete_rule(parser, "D")
rule_ids(parser)
## [1] "A"   "Any" "B"   "C"

Actions

Actions may be attached to an existing rule contained pegr parser by a call to add_action(parse, action) An action may be one of two forms: * An r function:

rule_ids(parser)
## [1] "A"   "Any" "B"   "C"
parser <- set_action(parser, "A", "list('A')")  #turn a lower case a to an upper case A
parser <- set_action(parser, "C", "list(paste(v,collapse=''))")  #paste all the characters together

Parsing

Parsing is the act of applying a rule to a string to be parsed and returning a result. The rule is specified its ruleid, a character string giving the rules name. That rule becomes the root of the ensuing parse. * Parsing is accomplished by apply_rule(parser, ruleid, exe=FALSE,, record=FALSE).

res.A <- apply_rule(parser, "A", "a")
res.AAB <- apply_rule(parser, "C", "aab")
res.BAA <- apply_rule(parser, "C", "baab")
c(status(res.A), status(res.AAB), status(res.BAA))
## [1]  TRUE  TRUE FALSE
c(consumed(res.A), consumed(res.AAB), consumed(res.BAA))
## [1] "a"   "aab" ""
# here exe is false, so no action is taken
res.AAB <- apply_rule(parser, "C", "aab")
# so the return is a list of the returns of the component atoms
value(res.AAB)
## $atom
## [1] "a"
## 
## $atom
## [1] "a"
## 
## $atom
## [1] "b"

To execute the actions, supply exe=TRUE as a parameter: * Return values of a parsing are lists, which can be gotton from value(res)

# here exe is true, so action is taken
res.AAB <- apply_rule(parser, "C", "aab", exe = TRUE)
# action A capitalizes, action C pastes together
value(res.AAB)
## $C
## [1] "AAb"

Debugging

Debugging the logic of a grammar can be accomplished testing the component rules starting with the leaves and building up Also, it's probably better, to start with actions disabled (exe=FALSE) to see first if the general flow is correct. To aid in the analysis of a given parse there are two visualizations tools: tree: plot: Thesse visualizaton tools are discussed in the next section

In addition to the visualization tools, we have: a rule stack, activated by setting the depth via set_rule_stack_limit and inspected using get_rule_stack a rule debugger, activated by debug.pegR, which allows one to step through and inspect the rules as they are encountered during parsing. Additionally, one may set break points at entry/exit of rules, and skip those rules with no breakpoints set. The rule debugger debugs RULE LOGIC minimal involvement of the low level programming language.

Visualization

Visualization consists of showing the nodes visited (and their values) during a parse of an input. This requires that the record flag be set to TRUE in the invocation of apply_rule.

# Here record is set to True
peg <- new.parser()
peg <- add_rule(peg, "A<-'a'")
peg <- add_rule(peg, "B<-'b' A")
peg <- add_rule(peg, "C<- A B")
peg <- set_action(peg, "A", "list('X')")
res.ABA <- apply_rule(peg, "C", "aba", exe = TRUE, record = TRUE)
# also, since exe isTrue, so action is taken action A capitalizes, action C
# pastes together
value(res.ABA)
## $A
## [1] "X"
## 
## $atom
## [1] "b"
## 
## $A
## [1] "X"

Ways to Visualize

There are two methods of visualization: * Tree: Prints the tree to console

tree(res.ABA)
## ____C(aba) = list(X, b, X )
##     |____A(a) = list(X )
##     |____B(ba) = list(b, X )
##          |____A(a) = list(X )
# plot only the ruleids (names)
plot(res.ABA)

plot of chunk debugRulePlot

# plot only the ruleids (names and inputs)
plot(res.ABA, show = c("names", "args"))

plot of chunk debugRulePlot

# plot only the values
plot(res.ABA, show = "vals")

plot of chunk debugRulePlot

# plot all
plot(res.ABA, show = "all", cex = 0.8)

plot of chunk debugRulePlot



mslegrand/pegr documentation built on May 23, 2019, 7:53 a.m.