suppressPackageStartupMessages({ library(minilexer) }) knitr::opts_chunk$set(echo = TRUE)
minilexer
provides a tool for simple tokenising/lexing of text files.
minilexer
aims to be great at helping to get unsupported text data formats into R fast.
For complicated parsing (especially of computer programs) you'll want to use the more formally correct lexing/parsing provided by the rly
package or the dparser
package.
Note: As of version 0.1.6, the TokenStream
handler has been removed.
remotes::install_github('coolbutuseless/minilexer')
Current the package provides one function:
minilexer::lex(text, patterns)
for splitting the text into tokens.patterns
) to split
text
into a character vector of tokens.patterns
argument is a named vector of character strings representing regular
expressions for elements to match within the text. minilexer
packageminilexer
provides a tool for simple tokenising/lexing text files.
I will emphasise the mini in minilexer
as this is not a rigorous or formally complete lexer, but it
suits 90% of my needs for turning data text formats into tokens.
For complicated parsing (especially of computer programs) you'll probably want to use the more formally correct lexing/parsing provided by the rly
package or the dparser
package.
lex()
to split sentence into tokenssentence_patterns <- c( word = "\\w+", whitespace = "\\s+", fullstop = "\\.", comma = "," ) sentence = "Hello there, Rstats." lex(sentence, sentence_patterns)
lex()
to split some simplified R code into tokensR_patterns <- c( number = "-?\\d*\\.?\\d+", name = "\\w+", equals = "==", assign = "<-|=", plus = "\\+", lbracket = "\\(", rbracket = "\\)", newline = "\n", whitespace = "\\s+" ) R_code <- "x <- 3 + 4.2 + rnorm(1)" R_tokens <- lex(R_code, R_patterns) R_tokens
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.