A parser is a function that accepts a string as input and returns
NULL on failure or a list with elements
input the unconsumed input, andresult an object representing the result of the parseif the string satisfies the grammar recognized by the parser.
A parser combinator is a higher order function that combines one or more simpler parsers to create a parser that recognizes a more complex grammar.
The pLiteral function generates a primitive parser that recognizes a
literal string.
library(Combin8R) pDog <- pLiteral("Tag","dog") pDog("monkey") pDog("dog")
The matched literal is returned as the value of the parse, as an S3
object with a class defined by the tag argument and inheriting from
pLiteral,
unclass(pDog("dog")$result)
and print methods are defined for the major classes to simplify the display of the result
pDog("dog")$result
The pRegex function generates a primitive parser that accepts a
grammar defined by a regular expression. The matched string and any
captured groups from the regular expression are returned as the value
of the parse
pLabel <- pRegex("Label","label (\\d+)") pLabel("label 7")
The tag argument can be a function that is used to construct the
object representing the value of the parse
pLabel <- pRegex(function(value) structure(list(value=as.numeric(value[[2]])),class=c("Label","pRegex")), "label (\\d+)") pLabel("label 7")
There are four main combinators
pAlt creates a parser that accepts input accepted by any one of a
number of simpler parsers
pSeq creates a parser that accepts input accepted by a number of
simpler parsers applied in sequence
pMany creates a parser that accepts input for which a simpler
parser succeeds zero or more times in succession
pSome creates a parser that accepts input for which a simpler
parser succeeds one or more times in succession
Consider the subset of the Logo language consisting of the constructs repeat, forward, left and right, so that a possible program is given by
program <- "repeat 10 [right 36 repeat 5 [forward 54 right 72]]"
We can write a parser for the Logo subset as follows
pInteger <- pRegex("Integer","\\d+") pSpaces <- pRegex("Spaces","\\s*") pSpaces1 <- pRegex("Spaces1","\\s+") pCommands <- pSome("Commands", pSeq("CommandWhite", pAlt("Command", pSeq("forwardCmd",pLiteral("forward"),pSpaces1,pInteger), pSeq("rightCmd",pLiteral("right"),pSpaces1,pInteger), pSeq("leftCmd",pLiteral("left"),pSpaces1,pInteger), pSeq("repeatCmd",pLiteral("repeat"),pSpaces1,pInteger,pSpaces1,pBlock) ), pSpaces)) pBlock <- pSeq("Block",pLiteral("["),pSpaces,pCommands,pLiteral("]"))
This parses the program text and produces an abstract syntax tree (AST)
p <- pCommands(program) p
The `formatAST generic is used to create a string representation of
the AST and is used to implement the print methods for the AST. The
printed representation of the AST can be simplified by writing
specific methods for the result classes
formatAST.Integer <- function(x,indent,...) as.character(x$value) formatAST.forwardCmd <- function(x,indent,...) paste("forward",formatAST(x$value[[3]])) formatAST.rightCmd <- function(x,indent,...) paste("right",formatAST(x$value[[3]])) formatAST.leftCmd <- function(x,indent,...) paste("left",formatAST(x$value[[3]])) formatAST.repeatCmd <- function(x,indent,...) paste("repeat",formatAST(x$value[[3]]),formatAST(x$value[[5]])) formatAST.Command <- function(x,indent,...) formatAST(x$value) formatAST.CommandWhite <- function(x,indent,...) formatAST(x$value[[1]]) formatAST.Commands <- function(x,indent,...) paste(sapply(x$value,formatAST),collapse=" ") formatAST.Block <- function(x,indent,...) paste("[",formatAST(x$value[[3]]),"]",sep="")
The AST resulting from the parse does not change, only its printed representation
p
The AST constructed by the simple parser is more complicated than strictly necessary because it contains nodes for purely syntactic elements of the language such as whitespace. These elements can be removed if we construct the AST ourselves
mkprm <- function(value) structure(list(value[[1]]$value,value[[3]]),class="prm") mkrpt <- function(value) structure(list(value[[3]],value[[5]]),class="rpt") mkblk <- function(value) structure(value[[3]],class="blk") pInteger <- pRegex(as.numeric,"\\d+") pSpaces <- pRegex("Spaces","\\s*") pSpaces1 <- pRegex("Spaces1","\\s+") pCommands <- pSome(function(value) value, pSeq(function(value) value[[1]], pAlt(function(value) value, pSeq(mkprm,pLiteral("forward"),pSpaces1,pInteger), pSeq(mkprm,pLiteral("right"),pSpaces1,pInteger), pSeq(mkprm,pLiteral("left"),pSpaces1,pInteger), pSeq(mkrpt,pLiteral("repeat"),pSpaces1,pInteger,pSpaces1,pBlock) ), pSpaces)) pBlock <- pSeq(mkblk,pLiteral("["),pSpaces,pCommands,pLiteral("]"))
The AST now takes the form
p <- pCommands(program) p
As before, the print representation can be simplified by by defining
appropriate methods for formatAST
formatAST.prm <- function(x,indent,...) paste(x[[1]],x[[2]],collapse=" ") print.prm <- function(x,indent=0,...) cat(formatAST(x,indent)) formatAST.rpt <- function(x,indent,...) paste("repeat",x[[1]],formatAST(x[[2]]),collapse=" ") print.rpt <- function(x,indent=0,...) cat(formatAST(x,indent)) formatAST.blk <- function(x,indent,...) paste("[",paste(sapply(x,formatAST),collapse=" "),"]",sep="") print.blk <- function(x,indent=0,...) cat(formatAST(x,indent))
so now
p
Alternately, a compiler for the Logo subset can be written by translating Logo constructs directly to R expressions
mkiter <- local({n <- 0 function() { n <<- n+1 as.name(paste("k",n,sep="")) }}) cmpfwd <- function(value) substitute({ x0 <- x1 x1 <- x1+r*c(cos(theta),sin(theta)) segments(x0[1],x0[2],x1[1],x1[2]) },list(r=value[[3]])) cmprgt <- function(value) substitute(theta <- theta + pi/180*a,list(a=value[[3]])) cmplft <- function(value) substitute(theta <- theta - pi/180*a,list(a=value[[3]])) cmprpt <- function(value) substitute(for(k in 1:n) b, list(k=mkiter(),n=value[[3]],b=value[[5]])) cmpblk <- function(value) if(length(value)>1) as.call(c(as.name("{"),value[[3]])) else value pInteger <- pRegex(as.numeric,"\\d+") pSpaces <- pRegex("Spaces","\\s*") pSpaces1 <- pRegex("Spaces1","\\s+") pCommands <- pSome(function(value) value, pSeq(function(value) value[[1]], pAlt(function(value) value, pSeq(cmpfwd,pLiteral("forward"),pSpaces1,pInteger), pSeq(cmprgt,pLiteral("right"),pSpaces1,pInteger), pSeq(cmplft,pLiteral("left"),pSpaces1,pInteger), pSeq(cmprpt,pLiteral("repeat"),pSpaces1,pInteger,pSpaces1,pBlock) ), pSpaces)) pBlock <- pSeq(cmpblk,pLiteral("["),pSpaces,pCommands,pLiteral("]")) pLogo <- function(input,xlim=c(-100,100),ylim=c(-100,100)) { p <- pCommands(input) if(!is.null(p)) { body <- as.call(c(as.name("{"), quote(plot.new()), substitute(plot.window(xlim,ylim)), quote(x1 <- c(0,0)), quote(theta <- 0), p$result)) list(input=p$input, result=substitute(local(body),list(body=body))) } }
The result of the parser is now R code
p <- pLogo(program) p
and evaluating this code executes the Logo program
eval(p$result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.