read_rtf: Extract Text from RTF (Rich Text Format) File

View source: R/striprtf.R

read_rtfR Documentation

Extract Text from RTF (Rich Text Format) File

Description

Parses an RTF file and extracts plain text as character vector.

Usage

read_rtf(
  file,
  verbose = FALSE,
  row_start = "*| ",
  row_end = "",
  cell_end = " | ",
  ignore_tables = FALSE,
  check_file = TRUE,
  ...
)

strip_rtf(
  text,
  verbose = FALSE,
  row_start = "*| ",
  row_end = "",
  cell_end = " | ",
  ignore_tables = FALSE
)

Arguments

file

Path to an RTF file. Must be character of length 1.

verbose

Logical. If TRUE, progress report is printed on console. While it can be informative when parsing a large file, this option itself makes the process slow.

row_start, row_end

strings to be added at the beginning and end of table rows

cell_end

string to be put at the end of table cells

ignore_tables

if TRUE, no special treatment for tables

check_file

if TRUE, conducts a quick check on the file if it is an RTF file. If the file fails to pass the check, returns NULL without parsing the file.

...

Addional arguments passed to readLines

text

Character of length 1. Expected to be contents of an RTF file.

Details

Rich text format (RTF) files are written as a text file consisting of ASCII characters. The specification has been developed by Microsoft. This function interprets the character strings and extracts plain texts of the file. Major part of the algorithm of this function comes from a stack overflow thread (https://stackoverflow.com/a/188877) and the references therein. This function is a translation of the above to R language, associated with C++ codes for enhancement.

An advance from the preceding implementation is that the function accomodates with various ANSI code pages. For example, RTF files created by Japanese version of Microsoft Word marks \ansicpg932, which indicates the code page 932 is used for letter-code conversion. The function detects the code page indication and convert the characters to UTF-8 where possible. Conversion tables are retrieved from here: (https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/).

Value

Character vector of extracted text

References

Examples

read_rtf(system.file("extdata/king.rtf", package = "striprtf"))

striprtf documentation built on Aug. 10, 2023, 5:09 p.m.