md_to_xml: Parse (R) Markdown as CommonMark XML tree

md_to_xmlR Documentation

Parse (R) Markdown as CommonMark XML tree

Description

Parses (R) Markdown file content according to the CommonMark specification and returns it as an XML parse tree.

Usage

md_to_xml(
  md,
  smart_punctuation = FALSE,
  hardbreaks = FALSE,
  normalize = TRUE,
  sourcepos = FALSE,
  footnotes = TRUE,
  extensions = c("strikethrough", "table", "tasklist"),
  eol = c("LF", "CRLF", "CR", "LFCR"),
  strip_xml_ns = TRUE
)

Arguments

md

(R) Markdown file content as a character scalar.

smart_punctuation

Whether or not to enable Pandoc's smart extension which converts straight quotes to curly quotes, ⁠---⁠ to an em-dash (—), ⁠--⁠ to an en-dash (–), and ... to ellipses (…). It also replaces regular spaces after certain abbreviations such as Mr. with non-breaking spaces.

hardbreaks

Whether or not to treat newlines as hard line breaks.

normalize

Consolidate adjacent text nodes.

sourcepos

Include source position attribute in output.

footnotes

parse footnotes

extensions

Enables Github extensions. Can be TRUE (all) FALSE (none) or a character vector with a subset of available extensions.

eol

End of line (EOL) control character sequence. One of

  • "LF" for the line feed (LF) character ("\n"). The standard on Unix and Unix-like systems (Linux, macOS, *BSD, etc.) and the default.

  • "CRLF" for the carriage return + line feed (CR+LF) character sequence ("\r\n"). The standard on Microsoft Windows, DOS and some other systems.

  • "CR" for the carriage return (CR) character ("\r"). The standard on classic Mac OS and some other antiquated systems.

  • "LFCR" for the line feed + carriage return (LF+CR) character sequence ("\n\r"). The standard on RISC OS and some other exotic systems.

strip_xml_ns

Whether or not to remove the default XML namespace (d1) assigned by commonmark::markdown_xml().

Value

An xml_document.

See Also

Other CommonMark parsing functions: md_xml_subnode_ix(), xml_to_md()

Examples

"# A title

Some prose.

## A subtitle

More prose.

## Another subtitle

Out of prose here.

### A sub-subtitle

I'm dug in.

# Another title

A last word." |> pal::md_to_xml()

salim-b/pal documentation built on Feb. 28, 2025, 6:51 p.m.