Read In a Text Document

Description

Return a function which reads in a text document from a tabular data structure (like a data frame or a list matrix) with knowledge about its internal structure and possible available metadata as specified by a so-called mapping.

Usage

1
readTabular(mapping)

Arguments

mapping

A named list of characters. The constructed reader will map each character entry to the content or metadatum of the text document as specified by the named list entry. Valid names include content to access the document's content, and character strings which are mapped to metadata entries.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the mapping) via lexical scoping.

Value

A function with the following formals:

elem

a named list with the component content which must hold the document to be read in.

language

a string giving the language.

id

a character giving a unique identifier for the created text document.

The function returns a PlainTextDocument representing the text and metadata extracted from elem$content. The arguments language and id are used as fallback if no corresponding metadata entries are found in elem$content.

See Also

Reader for basic information on the reader infrastructure employed by package tm.

Vignette 'Extensions: How to Handle Custom File Formats'.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
df <- data.frame(contents = c("content 1", "content 2", "content 3"),
                 title    = c("title 1"  , "title 2"  , "title 3"  ),
                 authors  = c("author 1" , "author 2" , "author 3" ),
                 topics   = c("topic 1"  , "topic 2"  , "topic 3"  ),
                 stringsAsFactors = FALSE)
m <- list(content = "contents", heading = "title",
          author = "authors", topic = "topics")
myReader <- readTabular(mapping = m)
ds <- DataframeSource(df)
elem <- getElem(stepNext(ds))
(result <- myReader(elem, language = "en", id = "id1"))
meta(result)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.