crack: Parse R Markdown or R scripts

View source: R/fuse.R

crackR Documentation

Parse R Markdown or R scripts

Description

Parse input into code chunks, inline code expressions, and text fragments: crack() is for parsing R Markdown, and sieve() is for R scripts.

Usage

crack(input, text = NULL)

sieve(input, text = NULL)

Arguments

input

A character vector to provide the input file path or text. If not provided, the text argument must be provided instead. The input vector will be treated as a file path if it is a single string, and points to an existing file or has a filename extension. In other cases, the vector will be treated as the text argument input. To avoid ambiguity, if a string should be treated as text input when it happens to be an existing file path or has an extension, wrap it in I(), or simply use the text argument instead.

text

A character vector as the text input. By default, it is read from the input file if provided.

Details

For R Markdown, a code chunk must start with a fence of the form ⁠```{lang}⁠, where lang is the language name, e.g., r or python. The body of a code chunk can start with chunk options written in "pipe comments", e.g., ⁠#| eval = TRUE, echo = FALSE⁠ (the CSV syntax) or ⁠#| eval: true⁠ (the YAML syntax). An inline code fragment is of the form `{lang} source` embedded in Markdown text.

For R scripts, text blocks are extracted by removing the leading ⁠#'⁠ tokens. All other lines are treated as R code, which can optionally be separated into chunks by consecutive lines of ⁠#|⁠ comments (chunk options are written in these comments). If no ⁠#'⁠ or ⁠#|⁠ tokens are found in the script, the script will be divided into chunks that contain smallest possible complete R expressions.

Value

A list of code chunks and text blocks:

  • Code chunks are of the form list(source, type = "code_chunk", options, comments, ...): source is a character vector of the source code of a code chunk, options is a list of chunk options, and comments is a vector of pipe comments.

  • Text blocks are of the form list(source, type = "text_block", ...). If the text block does not contain any inline code, source will be a character string (lines of text concatenated by line breaks), otherwise it will be a list with members that are either character strings (normal text fragments) or lists of the form list(source, options, ...) (source is the inline code, and options contains its options specified inside `{lang, ...}`).

Both code chunks and text blocks have a list member named lines that stores their starting and ending line numbers in the input.

Note

For simplicity, sieve() does not support inline code expressions. Text after ⁠#'⁠ is treated as pure Markdown.

It is a pure coincidence that the function names crack() and sieve() weakly resemble Carson Sievert's name, but I will consider adding a class name sievert to the returned value of sieve() if Carson becomes the president of the United States someday, which may make the value radioactive and introduce a new programming paradigm named Radioactive Programming (in case Reactive Programming is no longer fun or cool).

Examples

library(litedown)
# parse R Markdown
res = crack(c("```{r}\n1+1\n```", "Hello, `pi` = `{r} pi` and `e` = `{r} exp(1)`!"))
str(res)
# evaluate inline code and combine results with text fragments
txt = lapply(res[[2]]$source, function(x) {
    if (is.character(x))
        x else eval(parse(text = x$source))
})
paste(unlist(txt), collapse = "")

# parse R code
res = sieve(c("#' This is _doc_.", "", "#| eval=TRUE", "# this is code", "1 + 1"))
str(res)

litedown documentation built on Oct. 17, 2024, 1:06 a.m.